CN112717415A - Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game - Google Patents
Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game Download PDFInfo
- Publication number
- CN112717415A CN112717415A CN202110091260.4A CN202110091260A CN112717415A CN 112717415 A CN112717415 A CN 112717415A CN 202110091260 A CN202110091260 A CN 202110091260A CN 112717415 A CN112717415 A CN 112717415A
- Authority
- CN
- China
- Prior art keywords
- training
- model
- game
- reinforcement learning
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Electrically Operated Instructional Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an AI training method of a reinforcement learning battle game based on an information bottleneck theory, which comprises the following steps: 1) initializing an AI training model; 2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set; 3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages; 4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file. Compared with the prior art, the method has the advantages of high sampling efficiency, high training speed, high testing flexibility, AI grading and the like.
Description
Technical Field
The invention relates to the field of game intelligent AI learning, in particular to a reinforcement learning battle game AI training method based on an information bottleneck theory.
Background
With the development of deep learning technology in recent years, many achievements are obtained in the field of deep reinforcement learning, and more methods (such as DQN, A2C, PPO, DDPG, and the like) combining deep learning and reinforcement learning algorithms show strong effects in the aspect of video games AI, however, in many cases, in the reinforcement learning problem, the interaction cost of an agent and the environment is high, so it is desirable to make the algorithms converge as fast as possible, so as to save the training cost, that is, a higher-level intelligent strategy is learned through the same sampling rate.
In the existing fighting game, a man-machine fighting mode is one of important components of the game, the existing game AI is designed by artificially setting strategy distribution and targeted action mapping, so that the course is single and the flexibility of fighting among players is not provided, and meanwhile, in the existing method for training the game AI by reinforcement learning, the original pixels are used as input, and a lot of redundant information is carried to influence the network learning efficiency and the speed of a reinforcement learning algorithm. In the deep learning experiment, the neural network firstly remembers the input by the mutual Information of the variables of the input layer and the representation layer in the training Process, and then compresses the input Information according to a specific learning task to discard useless redundant Information, namely, reduce the mutual Information between the input layer and the representation layer, which is the Information E-C Process, but the existing reinforcement learning algorithm is not optimized for the Information extraction Process.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an AI training method for a reinforcement learning battle game based on an information bottleneck theory.
The purpose of the invention can be realized by the following technical scheme:
an AI training method for a reinforcement learning battle game based on an information bottleneck theory comprises the following steps:
1) initializing an AI training model;
2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set;
3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages;
4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file.
The step 1) specifically comprises the following steps:
11) determining an AI optional operation set A and the number n of AI with different required capability levels according to the game operation description;
12) initializing all network parameters in the model, including a value network model parameter theta and a strategy network model parameter phi;
13) determining hyper-parameters beta and eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
The step 2) specifically comprises the following steps:
21) sampling environmental sample batch data from environmentWherein, Xt,Xt+1A game picture sampled at the current time t and a game picture at the next time t +1, AtOptional set of operations for game AI at the current time t, RtK is the total number of sampling moments in order to select the environment reward corresponding to the operation, namely the game integral and the real-time attribute of the game role;
22) for each sampled game frameThe game AI obtains m operation samples according to the strategy network modelWherein the content of the first and second substances,for AI real-time operations sampled in the distribution according to a strategic network model, Pφ(.|Xt) Is a probability distribution model;
23) batch data of environmental samplesAnd obtaining m operation samples by AI according to the strategy network modelCorresponding integration is carried out to obtain a sample training batch data set
In the step 3), an AI training model is iteratively trained by adopting an A2C algorithm.
In the step 3), when the model is iteratively trained, the gradient of the model representation layer is calculated by the following formula:
wherein the content of the first and second substances,is the gradient of the model representation layer, P (X) is the distribution probability of the game picture X, phi (Z)iX) AI operation Z when the game screen obtained by the strategy network model is XiE represents expectation.
In the step 3), when the model is iteratively trained, the gradient of the reinforcement learning algorithm is calculated by the following formula:
wherein the content of the first and second substances,to strengthen the gradient of the learning algorithm, J (Z; θ) is the loss function of the A2C algorithm that adds an information bottleneck loss term.
The expression of the loss function J (Z; theta) is as follows:
wherein the content of the first and second substances,for the existing loss function in the framework of the A2C algorithm, R is the real-time prize value, i.e., the real-time change of the game points and the game character attributes, alpha is the prize attenuation coefficient in the A2C algorithm, and thetaΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitX) real-time value estimation, H (P)φ(at|Xt(ii) a θ)) is P distribution Pφ(at|Xt(ii) a θ) entropy.
In the step 3), when the model is iteratively trained by adopting a reinforcement learning algorithm, the network parameters theta and phi are respectively updated, whether the network parameters are converged is judged, if the network parameters are not converged, the training is carried out again, if the network parameters are converged, the training is stopped and the model is saved in stages, wherein the updating expression of the network parameters theta and phi is as follows:
the step 4) specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number nj(j ═ 1,2, …, n-1), each intermediate model comprising a value network model and a policy network model;
42) the intermediate model MjParameter fixing, re-taking andtraining in the same way in the AI training process and updating the middle model MjThe parameters of the middle full-connection layer part are converged, and the converged model parameters are stored;
43) the n-1 middle models M retrained in the step 42) are used1,M2,…,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels are generated according to the strategy network model1,F2,…,Fn;
44) Distributing files F of game AI strategy1,F2,…,FnThe real-time operation strategy is used as a real-time operation strategy of the fighting AI with n different capability levels, and is merged with other code files to construct the fighting AI with n different capability levels.
In the step 3), when an AI training model is iteratively trained by using an A2C algorithm, a monte carlo estimation method is adopted for fitting the expectation, and a Stein variation gradient descent method is adopted for gradient update.
Compared with the prior art, the invention has the following advantages:
firstly, the existing strategy gradient algorithms such as PPO, A2C and DDPG focus the visual field on the part of the convergence of the reinforcement learning algorithm, and the problem of information extraction from the environment state to the part of a cost function is not considered.
In the invention, the optimized gradient is obtained by adopting a Stein variation gradient descent method, and the problem that the probability distribution model of the representation layer under the condition of the input layer cannot be calculated in the information bottleneck problem is solved by utilizing the lower bound to normalize the unknown distribution.
Compared with the traditional manually designed AI, the fighting game AI designed by the invention has more possibly generated fighting strategies according to the real-time dynamic state of the game and more flexibility in actual test.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a diagram of the model architecture of the present invention.
FIG. 3 is a block diagram of an AI training model according to the present invention.
Fig. 4 shows a specific embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
As shown in fig. 1, the invention provides an AI training method for a reinforcement learning battle game based on an information bottleneck theory, comprising the following steps:
1) initializing network parameters and hyper-parameters of an AI training model (in this example, a CNN model is adopted, and the specific model structure is shown in FIG. 3), and setting learning rate and the number of samples sampled from parameter distribution;
2) performing decision interaction in a simulation environment through AI to obtain a sample training batch data set;
3) iteratively training an AI training model by adopting a reinforcement learning algorithm (in the example, an A2C algorithm) on the basis of a sample training batch data set obtained by the interaction of AI and the environment, and storing model parameters in stages;
4) and fixing part of the stored parameters of the models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain final strategy models of different levels of AI.
The specific process of each step is as follows:
the step 1) specifically comprises the following steps:
11) determining an AI optional operation set A (taking a street game as an example, specifically comprising operations of up-down, left-right movement, attack operation, defense operation, skill recruitment and the like in the street game) and the AI number n of required different capability levels according to the game operation description;
12) initializing all corresponding network parameters theta and phi in a value network model theta and a strategy network model phi;
13) determining a hyper-parameter beta, eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
The step 2) specifically comprises the following steps:
21) a batch of data is sampled from the environment:wherein Xt,Xt+1Respectively representing the game picture at that moment and the game picture at the next moment, AtRepresenting the selection operation set of the game AI (including the operations of up, down, left and right movement, attack operation, defense operation and skill recruitment operation in the street game), RtRepresenting environmental rewards corresponding to the state behaviors (including life value change, magic value change, distance from opponents, skill cooling time, game real-time scores and the like).
22) Real-time sampling of each game frameThe game AI takes m samples according to the strategy network modelWhereinThe AI sampled in the distribution according to the strategy network model is operated in real time (including operations of how to move, whether to defend, how to attack and the like).
23) Batch data of environmental samplesAnd data obtained by AI samplingCorresponding integration is carried out to obtain training batch data
Calculating the gradient of the representation layer in the sample training batch data D in the step 3) as follows:
wherein Z isiRepresenting agent slave probabilistic model Pφ(.|Xt) The real-time operation sample obtained by sampling, P (X) is the game picture distribution probability, phi (Z)iX) AI operation Z when the game screen obtained by the strategy network model is XiThe probability of (c).
The gradient of the reinforcement learning algorithm is:
wherein Z isiRepresenting agent slave probabilistic model Pφ(.|Xt) The real-time operating sample obtained from the middle sampling, J (Z; θ) is the loss function of the A2C algorithm that adds the information bottleneck loss term:
where R is the real-time award value (includingThe specific expressions of the life value change condition delta HP, the magic value change condition delta MP, the distance d between the magic value change condition delta MP and the opponent, the skill cooling time and the like, the game real-time Score Score and the like can be designed according to requirements, and an example is given here: r ═ Δ HP + Δ MP + Δ HP + Score), α is the reward attenuation coefficient in the algorithm, and can be adjusted automatically according to different design requirements, Θ isΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitAnd X) real-time value estimation, wherein H (P) is the entropy of P distribution, in the whole calculation process, fitting the expected E by using a Monte Carlo estimation method, and performing gradient optimization calculation by using a Stein variation gradient descent method.
The A2C algorithm adopted in this embodiment is mainly implemented by a Pytorch, and a specific AI training model architecture is shown in fig. 2.
The step of specifically updating the model parameters comprises:
31) updating the model network parameter phi according to the following updating principle:
32) updating the model network parameter phi according to the following updating principle:
33) and judging whether the model parameters are converged according to the parameter updating range, if not, re-training from the step 2, and if the convergence is reached, stopping training and storing the model, and particularly, storing all network model parameters in a time sequence every 100 times of updating the parameters in the whole training process.
And 4) fixing part of stored parameters of the models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain final strategy models of different levels of AI.
The method specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number ni(i-1, 2, …, n-1) M hereiniSimultaneously, the method comprises a value network model and a strategy network model;
42) model MiThe convolution layer parameters are fixed, and the model M is updated by adopting the same training mode in the AI training process againiThe parameters of the middle full-connection layer part are up to convergence, and the converged model parameters are stored;
43) the n-1 models M retrained in 42) are added1,M2,…,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels can be obtained1,F2,…,Fn。
44) Distributing policy to file F1,F2,…,FnThe real-time operation strategy of the fighting AI with n different capability levels is merged with other code files to construct n fighting AI with different capability levels.
According to the method, by introducing an information bottleneck theory, a mutual information penalty item is added into a loss function of an A2C algorithm to accelerate an AI training process, and meanwhile, the output strategies of AI of different grades are re-smoothed by using the technical means of pre-training and fine-tuning model parameters. Compared with the prior art, the invention has the effects of high sampling efficiency in the training process, further accelerating the training of the fighting games AI with different capability grades and enough flexibility.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A reinforcement learning fighting game AI training method based on an information bottleneck theory is characterized by comprising the following steps:
1) initializing an AI training model;
2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set;
3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages;
4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file.
2. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 1, wherein the step 1) specifically comprises the following steps:
11) determining an AI optional operation set A and the number n of AI with different required capability levels according to the game operation description;
12) initializing all network parameters in the model, including a value network model parameter theta and a strategy network model parameter phi;
13) determining hyper-parameters beta and eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
3. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 2, wherein the step 2) specifically comprises the following steps:
21) sampling environmental sample batch data from environmentWherein, Xt,Xt+1A game picture sampled at the current time t and a game picture at the next time t +1, AtFor the current time t tripOpal AI optional operation set, RtK is the total number of sampling moments in order to select the environment reward corresponding to the operation, namely the game integral and the real-time attribute of the game role;
22) for each sampled game frameThe game AI obtains m operation samples according to the strategy network modelWherein the content of the first and second substances,for AI real-time operations sampled in the distribution according to a strategic network model, Pφ(.|Xt) Is a probability distribution model;
4. The AI training method for reinforcement learning battle game based on information bottleneck theory as claimed in claim 1, wherein in the step 3), the AI training model is iteratively trained by using A2C algorithm.
5. The AI training method for an enhancedented learning fighting game based on information bottleneck theory according to claim 4, wherein in the step 3), when iteratively training the model, the gradient of the model representation layer is calculated as:
6. The AI training method for an reinforcement learning battle game based on an information bottleneck theory according to claim 5, wherein in the step 3), when the model is iteratively trained, the gradient of the reinforcement learning algorithm is calculated as:
7. The AI training method of an information bottleneck theory based reinforcement learning battle game as claimed in claim 6, wherein the loss function J (Z; θ) is expressed as:
wherein the content of the first and second substances,for impairments in the framework of the A2C algorithmThe loss function, R is the real-time prize value, i.e. the real-time change of the game credits and the game character attributes, alpha is the prize attenuation coefficient in the A2C algorithm, thetaΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitX) real-time value estimation.
8. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 7, wherein in the step 3), when the reinforcement learning algorithm is adopted to iteratively train the model, the network parameters θ and φ are respectively updated, whether the network parameters converge is judged, if the network parameters do not converge, the training is performed again, if the convergence is reached, the training is stopped and the model is saved in stages, and the network parameters θ and φ are updated according to the following expression:
9. the AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 1, wherein the step 4) specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number nj(j 1, 2.., n-1), each intermediate model comprising a value network model and a policy network model;
42) the intermediate model MjThe convolution layer parameters are fixed, and the middle model M is updated by adopting the same training mode in the AI training process againjThe parameters of the middle full-connection layer part are converged, and the converged model parameters are stored;
43) the n-1 middle models M retrained in the step 42) are used1,M2,.....,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels are generated according to the strategy network model1,F2,...,Fn;
44) Distributing files F of game AI strategy1,F2,...,FnThe real-time operation strategy is used as a real-time operation strategy of the fighting AI with n different capability levels, and is merged with other code files to construct the fighting AI with n different capability levels.
10. The AI training method for reinforcement learning battle games based on information bottleneck theory as claimed in claim 4, wherein in the step 3), when the AI training model is iteratively trained by the A2C algorithm, the expectation is fitted by the Monte Carlo estimation method, and the gradient is updated by the Stein variation gradient descent method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110091260.4A CN112717415B (en) | 2021-01-22 | 2021-01-22 | Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110091260.4A CN112717415B (en) | 2021-01-22 | 2021-01-22 | Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112717415A true CN112717415A (en) | 2021-04-30 |
CN112717415B CN112717415B (en) | 2022-08-16 |
Family
ID=75595220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110091260.4A Active CN112717415B (en) | 2021-01-22 | 2021-01-22 | Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112717415B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113641905A (en) * | 2021-08-16 | 2021-11-12 | 京东科技信息技术有限公司 | Model training method, information pushing method, device, equipment and storage medium |
CN114970714A (en) * | 2022-05-26 | 2022-08-30 | 哈尔滨工业大学 | Trajectory prediction method and system considering uncertain behavior mode of moving target |
CN116109525A (en) * | 2023-04-11 | 2023-05-12 | 北京龙智数科科技服务有限公司 | Reinforcement learning method and device based on multidimensional data enhancement |
CN116808590A (en) * | 2023-08-25 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN117162102A (en) * | 2023-10-30 | 2023-12-05 | 南京邮电大学 | Independent near-end strategy optimization training acceleration method for robot joint action |
CN113269315B (en) * | 2021-06-29 | 2024-04-02 | 安徽寒武纪信息科技有限公司 | Apparatus, method and readable storage medium for performing tasks using deep reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150100530A1 (en) * | 2013-10-08 | 2015-04-09 | Google Inc. | Methods and apparatus for reinforcement learning |
US20180207535A1 (en) * | 2017-01-24 | 2018-07-26 | Line Corporation | Method, apparatus, computer program and recording medium for providing game service |
CN109923560A (en) * | 2016-11-04 | 2019-06-21 | 谷歌有限责任公司 | Neural network is trained using variation information bottleneck |
CN110327624A (en) * | 2019-07-03 | 2019-10-15 | 广州多益网络股份有限公司 | A kind of game follower method and system based on course intensified learning |
CN111886059A (en) * | 2018-03-21 | 2020-11-03 | 威尔乌集团 | Automatically reducing use of cheating software in an online gaming environment |
CN111985640A (en) * | 2020-07-10 | 2020-11-24 | 清华大学 | Model training method based on reinforcement learning and related device |
CN112169311A (en) * | 2020-10-20 | 2021-01-05 | 网易(杭州)网络有限公司 | Method, system, storage medium and computer device for training AI (Artificial Intelligence) |
CN112221152A (en) * | 2020-10-27 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Artificial intelligence AI model training method, device, equipment and medium |
-
2021
- 2021-01-22 CN CN202110091260.4A patent/CN112717415B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150100530A1 (en) * | 2013-10-08 | 2015-04-09 | Google Inc. | Methods and apparatus for reinforcement learning |
CN109923560A (en) * | 2016-11-04 | 2019-06-21 | 谷歌有限责任公司 | Neural network is trained using variation information bottleneck |
US20180207535A1 (en) * | 2017-01-24 | 2018-07-26 | Line Corporation | Method, apparatus, computer program and recording medium for providing game service |
CN111886059A (en) * | 2018-03-21 | 2020-11-03 | 威尔乌集团 | Automatically reducing use of cheating software in an online gaming environment |
CN110327624A (en) * | 2019-07-03 | 2019-10-15 | 广州多益网络股份有限公司 | A kind of game follower method and system based on course intensified learning |
CN111985640A (en) * | 2020-07-10 | 2020-11-24 | 清华大学 | Model training method based on reinforcement learning and related device |
CN112169311A (en) * | 2020-10-20 | 2021-01-05 | 网易(杭州)网络有限公司 | Method, system, storage medium and computer device for training AI (Artificial Intelligence) |
CN112221152A (en) * | 2020-10-27 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Artificial intelligence AI model training method, device, equipment and medium |
Non-Patent Citations (1)
Title |
---|
黄学雨,郭勤: "《融合环境模型与深度强化学习的游戏算法》", 《江西理工大学学报》, vol. 39, no. 3, 30 June 2018 (2018-06-30), pages 84 - 89 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269315B (en) * | 2021-06-29 | 2024-04-02 | 安徽寒武纪信息科技有限公司 | Apparatus, method and readable storage medium for performing tasks using deep reinforcement learning |
CN113641905A (en) * | 2021-08-16 | 2021-11-12 | 京东科技信息技术有限公司 | Model training method, information pushing method, device, equipment and storage medium |
CN113641905B (en) * | 2021-08-16 | 2023-10-03 | 京东科技信息技术有限公司 | Model training method, information pushing method, device, equipment and storage medium |
CN114970714A (en) * | 2022-05-26 | 2022-08-30 | 哈尔滨工业大学 | Trajectory prediction method and system considering uncertain behavior mode of moving target |
CN114970714B (en) * | 2022-05-26 | 2024-05-03 | 哈尔滨工业大学 | Track prediction method and system considering uncertain behavior mode of moving target |
CN116109525A (en) * | 2023-04-11 | 2023-05-12 | 北京龙智数科科技服务有限公司 | Reinforcement learning method and device based on multidimensional data enhancement |
CN116109525B (en) * | 2023-04-11 | 2024-01-05 | 北京龙智数科科技服务有限公司 | Reinforcement learning method and device based on multidimensional data enhancement |
CN116808590A (en) * | 2023-08-25 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN116808590B (en) * | 2023-08-25 | 2023-11-10 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN117162102A (en) * | 2023-10-30 | 2023-12-05 | 南京邮电大学 | Independent near-end strategy optimization training acceleration method for robot joint action |
Also Published As
Publication number | Publication date |
---|---|
CN112717415B (en) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112717415B (en) | Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game | |
CN112668235B (en) | Robot control method based on off-line model pre-training learning DDPG algorithm | |
Hester et al. | Deep q-learning from demonstrations | |
CN111766782B (en) | Strategy selection method based on Actor-Critic framework in deep reinforcement learning | |
Kurin et al. | The atari grand challenge dataset | |
CN111856925B (en) | State trajectory-based confrontation type imitation learning method and device | |
CN111008449A (en) | Acceleration method for deep reinforcement learning deduction decision training in battlefield simulation environment | |
CN109284812B (en) | Video game simulation method based on improved DQN | |
CN111882476B (en) | Image steganography method for automatic learning embedding cost based on deep reinforcement learning | |
CN113952733A (en) | Multi-agent self-adaptive sampling strategy generation method | |
CN111282272B (en) | Information processing method, computer readable medium and electronic device | |
CN113947022B (en) | Near-end strategy optimization method based on model | |
Zhao et al. | Handling large-scale action space in deep Q network | |
CN116090549A (en) | Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium | |
CN112257348B (en) | Method for predicting long-term degradation trend of lithium battery | |
CN114169385A (en) | MSWI process combustion state identification method based on mixed data enhancement | |
Tong et al. | Enhancing rolling horizon evolution with policy and value networks | |
KR102209917B1 (en) | Data processing apparatus and method for deep reinforcement learning | |
CN116596059A (en) | Multi-agent reinforcement learning method based on priority experience sharing | |
CN116204849A (en) | Data and model fusion method for digital twin application | |
CN115293361A (en) | Rainbow agent training method based on curiosity mechanism | |
CN115964898A (en) | Bignty game confrontation-oriented BC-QMIX on-line multi-agent behavior decision modeling method | |
Liu et al. | Forward-looking imaginative planning framework combined with prioritized-replay double DQN | |
CN113469904A (en) | General image quality enhancement method and device based on cycle consistency loss | |
Zhang et al. | Side-Scrolling Platform Game Levels Reachability Repair Method and Its Applications to Super Mario Bros |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |