CN110772794B - Intelligent game processing method, device, equipment and storage medium - Google Patents
Intelligent game processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110772794B CN110772794B CN201910966855.2A CN201910966855A CN110772794B CN 110772794 B CN110772794 B CN 110772794B CN 201910966855 A CN201910966855 A CN 201910966855A CN 110772794 B CN110772794 B CN 110772794B
- Authority
- CN
- China
- Prior art keywords
- game
- action
- game state
- state data
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 230000009471 action Effects 0.000 claims abstract description 118
- 230000000875 corresponding effect Effects 0.000 claims abstract description 66
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 23
- 230000008901 benefit Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000004088 simulation Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101100384355 Mus musculus Ctnnbip1 gene Proteins 0.000 description 1
- 241000312117 Phago Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses an intelligent game processing method, which comprises the following steps: acquiring game state data of a current player of an intelligent game, and simulating the intelligent game; searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search is used as the game state data of the current player in the next search; storing game information generated correspondingly after each action executed by the whole round of game into a database; and optimizing a pre-constructed value network and a loss value function of the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain an optimal value network and an optimal action probability network. The embodiment of the invention also discloses an intelligent game processing device, equipment and a storage medium, which solve the problems of inaccurate behavior decision of non-player characters and reduced game quality in the incomplete information game by adopting a plurality of embodiments.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing an intelligent game.
Background
The players in the round-making game take turns to own rounds, and can only operate the rounds until the rounds arrive, so that the interest is strong, and the round-making game has become one of the main leisure and entertainment modes of people. The action decision of the non-player character in the round-trip game is often an important factor influencing the game quality and the user experience, the traditional game artificial intelligence is realized through a state machine or an action tree, and the corresponding operation of the non-player character is triggered through different states.
With the success of Al phaGo on go, reinforcement learning is increasingly widely applied to games, however, chess games have the characteristic of complete information, the states of the opponent and own are all disclosed, hidden information and random factors are not existed. In the game with incomplete information, the player cannot know the information of the other party completely, the effect brought by the player has randomness, a plurality of uncertain factors exist, and the training of reinforcement learning is difficult, so that the action decision of the non-player character is inaccurate, and the game quality is reduced.
Disclosure of Invention
The embodiment of the invention provides an intelligent game processing method, device, equipment and storage medium, which can effectively solve the problems of inaccurate behavior decision and reduced game quality of non-player characters when facing to incomplete information games in the prior art.
An embodiment of the present invention provides an intelligent game processing method, including:
acquiring game state data of a current player of an intelligent game, and simulating the intelligent game;
searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search are taken as the game state data of the current player in the next search;
storing game information generated correspondingly after each action executed by the whole round of game into a database;
optimizing a pre-constructed value network and a loss value function of a pre-constructed action probability network by adopting a back propagation method according to sampling data acquired from the database to obtain an optimal value network and an optimal action probability network;
wherein the game information includes: the game state data of the current player, the probability label of executing each action corresponding to the game state data of the current player, the value label of the current game state and the game result when the whole round of game is finished;
the loss value functions of the pre-constructed value network and the pre-constructed action probability network are specifically as follows:
L=αLv+βLp+ηw t w;
wherein Lv is a loss value function of the value network, alpha is a coefficient of the loss value function of the value network, lp is a loss value function of the action probability network, beta is a coefficient of the loss value function of the action probability network, w t w is the regularization penalty term, and η is the coefficient of the regularization penalty term.
As an improvement of the scheme, searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search is taken as the game state data of the current player in the next search, and specifically comprises the following steps:
determining a node corresponding to the game state data of the current player;
searching at least one corresponding child node according to the node corresponding to the game state data of the current player;
and selecting the best child node from at least one corresponding child node as an action executed under the game state data of the current player.
As an improvement of the above-described aspect, the selecting the best child node from at least one of the corresponding child nodes as the action performed under the game state data of the current player; the method specifically comprises the following steps:
selecting the child node with the largest access times as the optimal child node;
or alternatively, the first and second heat exchangers may be,
selecting an optimal child node from at least one corresponding child node according to preset selection probability;
or alternatively, the first and second heat exchangers may be,
selecting the child node with the maximum confidence value; wherein,the UCB is a confidence value, N (v) is the current access times of the child nodes, N (v') is the access times of the child nodes,e (Q (v)) is the tie benefit obtained by the current child node in the full round of game.
As an improvement of the above solution, the simulating the intelligent game specifically includes:
the visible information is kept unchanged, and the non-visible information is subjected to random generation operation.
As an improvement of the above solution, after searching the action corresponding to the game status data of the current player according to the preset monte carlo search tree until the whole round of game is finished, before saving the game information corresponding to each action executed by the whole round of game to the database, the method further includes:
and carrying out back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the profit value and the access times corresponding to each current game state.
As an improvement of the above solution, the method further includes:
if the action which is executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which is executed possibly corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which is executed possibly is recorded.
Another embodiment of the present invention correspondingly provides an intelligent game processing apparatus, including:
the acquisition module is used for acquiring game state data of a current player of the intelligent game and simulating the intelligent game;
the searching matching module is used for searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search are taken as the game state data of the current player in the next search;
the storage module is used for storing game information generated correspondingly after each action executed by the whole round of game into the database;
the construction module is used for optimizing a pre-constructed value network and a loss value function of a pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to acquire an optimal value network and an optimal action probability network;
wherein the game information includes: the game state data of the current player, the probability label of executing each action corresponding to the game state data of the current player, the value label of the current game state and the game result when the whole round of game is finished;
the loss value functions of the pre-constructed value network and the pre-constructed action probability network are specifically as follows:
L=αLv+βLp+ηw t w;
wherein Lv is a loss value function of the value network, alpha is a coefficient of the loss value function of the value network, lp is a loss value function of the action probability network, beta is a coefficient of the loss value function of the action probability network, w t w is the regularization penalty term, and η is the coefficient of the regularization penalty term.
Another embodiment of the present invention provides a smart game processing device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the smart game processing method according to the embodiment of the present invention.
Another embodiment of the present invention provides a storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the intelligent game processing method described in the foregoing embodiment of the present invention.
Compared with the prior art, the intelligent game processing method, the intelligent game processing device, the intelligent game processing equipment and the intelligent game storage medium disclosed by the embodiment of the invention are characterized in that game state data of a current player of an intelligent game are acquired and simulated, actions which are correspondingly executed by the game state data of the current player are searched according to a preset Monte Carlo search tree, the game data generated by the actions which are correspondingly executed are used as the current game state to determine the actions which are correspondingly executed when the next search is carried out, until the whole round of game is finished, game information which is correspondingly generated after each action which is executed by the whole round of game is stored in a database, a preset value network and an action probability network loss value function are optimized according to sampling data acquired in the database, so that an optimal value network and an optimal action probability network are obtained, and in a non-complete information game, the training process of reinforcement learning is simplified, so that the action decision of the non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of an intelligent game processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a simulation flow of a preset Monte Carlo search tree according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an intelligent game processing device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of an intelligent game processing method according to an embodiment of the invention is shown.
The embodiment of the invention provides an intelligent game processing method, which comprises the following steps:
s10, acquiring game state data of a current player of the intelligent game, and simulating the intelligent game.
Wherein the game state data of the current player includes: the game battle comprises a gate assignment, a grade, a blood volume value, a magic value, equipment information, pet information, a gain effect carried by the player, a reduction effect and the like of both players. And combining the data into a vector form for input.
In this embodiment, the game state data of the current player may be obtained according to the established game data model, which is just one implementation of this embodiment, not limited to the above-mentioned implementation, but may be other implementations in which the game state data of the current player may be obtained.
S20, searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each time of searching is used as the game state data of the current player in the next searching.
Referring to fig. 2, the preset monte carlo search tree is: each node of the Monte Carlo search corresponds to a state in the game, and the child node of each node generates a new state after taking various actions in the state, and each action corresponds to a node. Actions available for each state in the present example include different skills of the player to bring in, use of the item, and the like. Using the same actions in the same state, different effects may also occur due to the randomness of the game, while each child node is directed only to the actions taken.
Defining tree node attributes in the Monte Carlo search tree, wherein the node attributes comprise indexes of father nodes, indexes of all child nodes, the number of times the node is accessed, an action probability value of output of a pre-constructed action probability network to a current node, and a final benefit value of a game after the node is calculated according to the pre-constructed value network.
In this embodiment, in the simulation of the same game, a new Monte Carlo tree may be recreated every time an action is performed (i.e., one action is performed), or the previously created Monte Carlo tree may be reused, and game data generated by the root node according to the action actually taken at present is regarded as game state data of the current player until the game of the current game is finished, and a new Monte Carlo tree is recreated at the beginning of the next game.
Specifically, when the next action to be correspondingly executed is required to be determined according to the game state data of the current player, searching the action to be correspondingly executed through a preset Monte Carlo tree, taking the game data generated by the searched corresponding action as the game state data of the current player to search for the corresponding action continuously until the simulated intelligent game is ended.
S30, saving game information corresponding to each action data executed by the whole round of game to a database; wherein the game information includes: the game state data of the current player, the probability tags of the current player corresponding to executing each action, and the game state value tags of the current player.
In this embodiment, the current game state value tag includes: victory or fail or tie is noted as 1, -1 and 0, respectively. The determination of the action probability label can be used as the action probability label of each child node according to the ratio of the accessed times of each child node to the accessed times of the father node after the simulation is finished.
Specifically, game information corresponding to each action data is stored in a database, the database is expanded, the diversity of training samples is increased, the generalization capability is improved, and training is accelerated.
And S40, optimizing a loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain an optimal value network and an optimal action probability network.
The preset value network and action probability network loss value function specifically comprises the following steps: l=αlv+βlp+ηw t w, wherein Lv is the loss function of the value network, the coefficient is alpha, lp is the loss function of the action probability network, and the coefficient is beta, w t w is a regularization penalty term and the coefficient is η. The loss function of the value network is the mean square error of the value tag output at a certain game state at that state from the actual value tag at that state. The action probability network loss value function is obtained byThe output probability value of a certain game state in the action probability network and the cross entropy of the action probability label.
Specifically, gradient descent is performed by deep learning back propagation until convergence, so as to obtain the optimal value network and action probability network.
In summary, by acquiring game state data of a current player of an intelligent game and performing simulation, searching actions corresponding to the game state data of the current player according to a preset Monte Carlo search tree, determining the actions corresponding to the actions to be performed as the current game state when performing the next search, storing game information corresponding to each action performed by the whole round of game to a database until the whole round of game is finished, optimizing a preset value network and action probability network loss value function by adopting a back propagation method according to sampling data acquired from the database, and obtaining an optimal value network and an optimal action probability network.
As an improvement of the scheme, searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search is taken as the game state data of the current player in the next search, and specifically comprises the following steps:
and determining a node corresponding to the game state data of the current player.
In this embodiment, the game state data of the current player is used as the root node of the preset monte carlo tree.
And searching at least one corresponding child node according to the node corresponding to the game state data of the current player.
In this embodiment, the child nodes are accessed layer by layer according to the root node.
And selecting the best child node from at least one corresponding child node as an action executed under the game state data of the current player.
And selecting the child node with the largest access times as the optimal child node.
Or selecting the best sub-node from at least one corresponding sub-node according to a preset selection probability.
Specifically, according to the preset selection probability, the optimal sub-node is selected from at least one corresponding sub-node, so that randomness is increased, and the diversity of training samples is increased.
Or selecting the child node with the largest confidence value; wherein,the UCB is a confidence value, N (v) is the current child node access times, N (v') is the child node access times, and E (Q (v)) is the tie benefit obtained by the current child node in the whole round of game.
As an improvement of the above solution, the simulating the intelligent game specifically includes:
the visible information is kept unchanged, and the non-visible information is subjected to random generation operation.
Illustratively, each round of simulation requires a regeneration of the game environment, the state that the own can see in the state of the counterpart remains unchanged for the state of the own and the state that the own cannot see in the state of the counterpart requires a re-random generation (as shown in fig. 2, i.e., implicit partial re-random generation). In this example, the object used, some special gains or benefits, some special skills that can be used are not visible to the opponent's own. Each new round of simulation is to re-randomize this non-visible information, each time the state-unknown part is started again.
As an improvement of the above solution, after searching the action data corresponding to the game status data of the current player according to the preset monte carlo search tree until the whole round of game is finished, before saving the game information corresponding to each action data executed by the whole round of game to the database, the method further includes:
and carrying out back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the profit value and the access times corresponding to each current game state.
In this embodiment, the current child node propagates back to the root node, and the profit value and the access times in each node that result in the result (winning or losing or tie) are updated back according to the last player game status data, when the profit value is calculated, the moving average calculation method is adopted to update the average profit iteratively, and according to the calculated judgment, the higher the profit value, the greater the potential profit of the player game status data is indicated.
As an improvement of the above solution, the method further includes:
if the action data which is executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which is executed possibly corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which is executed possibly is recorded.
In this embodiment, according to game status data of a current player, the game status data of the current player is input into a preset neural network to obtain a value of the game status data of the current player and an execution probability of an available execution action, a current node is expanded to obtain sub-nodes, the sub-nodes are available actions, the probability values are recorded in each sub-node, the corresponding sub-nodes comprise all actions which can be taken in the current game status, including application of available skills and use of articles, and the probabilities corresponding to each action are recorded in the expansion.
Referring to fig. 3, a schematic structural diagram of an intelligent game processing device according to an embodiment of the invention is shown.
The embodiment of the invention correspondingly provides an intelligent game processing device, which comprises:
and the acquisition module 10 is used for acquiring the game state data of the current player of the intelligent game and simulating the intelligent game.
The search matching module 20 is configured to search actions corresponding to the game status data of the current player according to a preset monte carlo search tree until the whole round of game is completed; the game data generated by the corresponding action obtained by each time of searching is used as the game state data of the current player in the next searching.
And the storage module 30 is used for storing the game information generated correspondingly after each action executed by the whole round of game into a database.
And the construction module 40 is configured to optimize the pre-constructed value network and the loss value function of the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database, so as to obtain an optimal value network and an optimal action probability network.
In summary, according to the intelligent game processing device provided by the embodiment of the invention, game state data of a current player of an intelligent game is obtained and simulated, actions corresponding to the game state data of the current player are searched according to a preset Monte Carlo search tree, the action corresponding to the actions is determined by using the game data generated by the actions corresponding to the actions as the current game state when the next search is performed until the whole round of game is finished, game information generated after each action is performed by the whole round of game is saved to a database, a preset value network and an action probability network loss value function are optimized according to sampling data obtained in the database by adopting a back propagation method, so that an optimal value network and an optimal action probability network are obtained, and in a non-complete information game, the training process of reinforcement learning is simplified, so that the action decision of a non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.
Referring to fig. 4, a schematic diagram of a smart game processing device according to an embodiment of the present invention is provided. The intelligent game processing device of this embodiment includes: a processor 11, a memory 12 and a computer program stored in the memory and executable on the processor. The steps of the above-described embodiments of the intelligent game processing method are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the intelligent game processing device.
The intelligent game processing equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The smart game processing device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a smart game processing device and is not meant to be limiting, and that more or fewer components than shown may be included, or certain components may be combined, or different components may be included, for example, the smart game processing device may also include input-output devices, network access devices, buses, etc.
The processor 11 may be a central processing unit (Centra l Process i ng Un it, CPU), but may also be other general purpose processors, digital signal processors (Digita l Signa l Processor, DSP), application specific integrated circuits (App l icat ion Specifi C I ntegrated Ci rcuit, asic), off-the-shelf programmable gate arrays (Fie l d-Programmab l e Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center of the intelligent game processing device, connecting the various parts of the overall intelligent game processing device using various interfaces and lines.
The memory 12 may be used to store the computer programs and/or modules, and the processor may implement various functions of the intelligent game processing device by executing or executing the computer programs and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory Card (Smart Med ia Card, SMC), secure Digital (SD) Card, flash Card (F1 ash Card), at least one disk storage device, flash memory device, or other volatile solid state storage device.
Wherein the integrated modules/units of the intelligent game processing device, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier wave signal, a telecommunication signal, a software distribution medium, and so forth.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.
Claims (9)
1. An intelligent game processing method, comprising:
acquiring game state data of a current player of an intelligent game, and simulating the intelligent game;
searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search are taken as the game state data of the current player in the next search;
storing game information generated correspondingly after each action executed by the whole round of game into a database;
optimizing a pre-constructed value network and a loss value function of a pre-constructed action probability network by adopting a back propagation method according to sampling data acquired from the database to obtain an optimal value network and an optimal action probability network;
wherein the game information includes: the game state data of the current player, the probability label of executing each action corresponding to the game state data of the current player, the value label of the current game state and the game result when the whole round of game is finished;
the loss value functions of the pre-constructed value network and the pre-constructed action probability network are specifically as follows:
L=αLv+βLp+ηw t w;
wherein Lv is a loss value function of the value network, alpha is a coefficient of the loss value function of the value network, lp is a loss value function of the action probability network, beta is a coefficient of the loss value function of the action probability network, w t w is the regularization penalty term, and η is the coefficient of the regularization penalty term.
2. The intelligent game processing method according to claim 1, wherein the action corresponding to the game state data of the current player is searched according to a preset monte carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search is taken as the game state data of the current player in the next search, and specifically comprises the following steps:
determining a node corresponding to the game state data of the current player;
searching at least one corresponding child node according to the node corresponding to the game state data of the current player;
and selecting the best child node from at least one corresponding child node as an action executed under the game state data of the current player.
3. The intelligent game processing method according to claim 2, wherein the selecting the best child node from at least one of the corresponding child nodes is performed as an action under game state data of the current player; the method specifically comprises the following steps:
selecting the child node with the largest access times as the optimal child node;
or alternatively, the first and second heat exchangers may be,
selecting an optimal child node from at least one corresponding child node according to preset selection probability;
or alternatively, the first and second heat exchangers may be,
4. The intelligent game processing method according to claim 1, wherein the simulating the intelligent game specifically comprises:
the visible information is kept unchanged, and the non-visible information is subjected to random generation operation.
5. The intelligent game processing method according to claim 1, wherein after searching the action corresponding to the game status data of the current player according to the preset monte carlo search tree until the whole round of game is finished, before saving the game information corresponding to each action executed by the whole round of game to the database, the method further comprises:
and carrying out back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the profit value and the access times corresponding to each current game state.
6. The intelligent game processing method according to claim 1, wherein the method further comprises:
if the action which is executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which is executed possibly corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which is executed possibly is recorded.
7. An intelligent game processing device, comprising:
the acquisition module is used for acquiring game state data of a current player of the intelligent game and simulating the intelligent game;
the searching matching module is used for searching actions correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each execution of the search are taken as the game state data of the current player in the next search;
the storage module is used for storing game information generated correspondingly after each action executed by the whole round of game into the database;
the construction module is used for optimizing a pre-constructed value network and a loss value function of a pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to acquire an optimal value network and an optimal action probability network;
wherein the game information includes: the game state data of the current player, the probability label of executing each action corresponding to the game state data of the current player, the value label of the current game state and the game result when the whole round of game is finished;
the loss value functions of the pre-constructed value network and the pre-constructed action probability network are specifically as follows:
L=αLv+βLp+ηw t w;
wherein Lv is a loss value function of the value network, alpha is a coefficient of the loss value function of the value network, lp is a loss value function of the action probability network, beta is a coefficient of the loss value function of the action probability network, w t w is the regularization penalty term, and η is the coefficient of the regularization penalty term.
8. A smart game processing device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the smart game processing method of any one of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the intelligent game processing method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910966855.2A CN110772794B (en) | 2019-10-12 | 2019-10-12 | Intelligent game processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910966855.2A CN110772794B (en) | 2019-10-12 | 2019-10-12 | Intelligent game processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110772794A CN110772794A (en) | 2020-02-11 |
CN110772794B true CN110772794B (en) | 2023-06-16 |
Family
ID=69386163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910966855.2A Active CN110772794B (en) | 2019-10-12 | 2019-10-12 | Intelligent game processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110772794B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114425166A (en) * | 2022-01-27 | 2022-05-03 | 北京字节跳动网络技术有限公司 | Data processing method, data processing device, storage medium and electronic equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119804A (en) * | 2019-05-07 | 2019-08-13 | 安徽大学 | A kind of Ai Ensitan chess game playing algorithm based on intensified learning |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8545332B2 (en) * | 2012-02-02 | 2013-10-01 | International Business Machines Corporation | Optimal policy determination using repeated stackelberg games with unknown player preferences |
JP6511333B2 (en) * | 2015-05-27 | 2019-05-15 | 株式会社日立製作所 | Decision support system and decision support method |
CN106390456B (en) * | 2016-09-30 | 2018-09-18 | 腾讯科技(深圳)有限公司 | The generation method and device of role-act in game |
US10402723B1 (en) * | 2018-09-11 | 2019-09-03 | Cerebri AI Inc. | Multi-stage machine-learning models to control path-dependent processes |
CN107050839A (en) * | 2017-04-14 | 2017-08-18 | 安徽大学 | Amazon chess game playing by machine system based on UCT algorithms |
CN109908584A (en) * | 2019-03-13 | 2019-06-21 | 北京达佳互联信息技术有限公司 | A kind of acquisition methods of game information, device and electronic equipment |
CN110083748A (en) * | 2019-04-30 | 2019-08-02 | 南京邮电大学 | A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree |
-
2019
- 2019-10-12 CN CN201910966855.2A patent/CN110772794B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119804A (en) * | 2019-05-07 | 2019-08-13 | 安徽大学 | A kind of Ai Ensitan chess game playing algorithm based on intensified learning |
Also Published As
Publication number | Publication date |
---|---|
CN110772794A (en) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Perolat et al. | Mastering the game of stratego with model-free multiagent reinforcement learning | |
Torrado et al. | Deep reinforcement learning for general video game ai | |
CN109621422B (en) | Electronic chess and card decision model training method and device and strategy generation method and device | |
CN107423274B (en) | Artificial intelligence-based game comment content generation method and device and storage medium | |
US10272341B1 (en) | Procedural level generation for games | |
CN110443284B (en) | Artificial intelligence AI model training method, calling method, server and readable storage medium | |
Janusz et al. | Helping ai to play hearthstone: Aaia'17 data mining challenge | |
CN107526682B (en) | Method, device and equipment for generating AI (Artificial Intelligence) behavior tree of test robot | |
CN110841295B (en) | Data processing method based on artificial intelligence and related device | |
Cazenave | Monte carlo beam search | |
Zhang et al. | AlphaZero | |
Baier et al. | Evolutionary MCTS for multi-action adversarial games | |
US20230306787A1 (en) | Movement extraction method and apparatus for dance video, computer device, and storage medium | |
CN109011580B (en) | Incomplete game card face obtaining method and device, computer equipment and storage medium | |
JP7031811B2 (en) | A method and system for training player characters in sports games using spatial dualization | |
CN110598853B (en) | Model training method, information processing method and related device | |
CN110772794B (en) | Intelligent game processing method, device, equipment and storage medium | |
CN111589120A (en) | Object control method, computer device, and computer-readable storage medium | |
CN112274935A (en) | AI model training method, use method, computer device and storage medium | |
CN109948062A (en) | A kind of target matching method, device, server, system and storage medium | |
CN113946604B (en) | Staged go teaching method and device, electronic equipment and storage medium | |
Deng et al. | A study of prisoner's dilemma game model with incomplete information | |
Chaslot et al. | Meta monte-carlo tree search for automatic opening book generation | |
CN110598182A (en) | Information prediction method and related equipment | |
Fatemi et al. | Rating and generating Sudoku puzzles based on constraint satisfaction problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |