CN110772794A

CN110772794A - Intelligent game processing method, device, equipment and storage medium

Info

Publication number: CN110772794A
Application number: CN201910966855.2A
Authority: CN
Inventors: 徐波
Original assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Multi Benefit Network Co Ltd; Guangzhou Duoyi Network Co Ltd
Current assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Multi Benefit Network Co Ltd; Guangzhou Duoyi Network Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-02-11
Anticipated expiration: 2039-10-12
Also published as: CN110772794B

Abstract

The invention discloses an intelligent game processing method, which comprises the following steps: acquiring game state data of a current player of an intelligent game, and simulating the intelligent game; searching the action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search; storing game information correspondingly generated after each action executed by the whole round of game to a database; and optimizing the loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network. The embodiment of the invention also discloses an intelligent game processing device, equipment and a storage medium, and the problems of inaccurate behavior decision of non-player characters and reduced game quality in the incomplete information game are solved by adopting a plurality of embodiments.

Description

Intelligent game processing method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an intelligent game processing method, an intelligent game processing device, intelligent game processing equipment and a storage medium.

Background

Players in the turn-based game take turns, and can operate only by turning to the turn, so that the game is strong in interestingness and becomes one of the main entertainment modes of people. The behavior decision of the non-player character in the turn-based game is often an important factor influencing the game quality and the user experience, the artificial intelligence of the traditional game is realized through a state machine or a behavior tree, and the corresponding operation of the non-player character is triggered through different states.

With the success of AlphaGo in the go, the application of reinforcement learning in games is more and more extensive, however, the chess games have the characteristic of complete information, the states of the opposite party and the own party are all public, no hidden information exists, and no random factor exists. In the game of incomplete information, the player can not completely know the information of the opposite side, the effect brought by the invitation is random, a lot of uncertain factors exist, and the training of reinforcement learning is greatly difficult, so that the behavior decision of the non-player character is inaccurate, and the game quality is reduced.

Disclosure of Invention

The embodiment of the invention provides an intelligent game processing method, an intelligent game processing device, intelligent game processing equipment and an intelligent game processing storage medium, which can effectively solve the problems that when a non-complete information game is faced in the prior art, the behavior decision of a non-player character is inaccurate, and the game quality is reduced.

An embodiment of the present invention provides an intelligent game processing method, including:

acquiring game state data of a current player of an intelligent game, and simulating the intelligent game;

searching the action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;

storing game information correspondingly generated after each action executed by the whole round of game to a database;

and optimizing the loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.

As an improvement of the above scheme, the action correspondingly executed by the game state data of the current player is searched according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search, and the method specifically comprises the following steps:

determining a node corresponding to the game state data of the current player;

searching at least one corresponding child node according to the node corresponding to the game state data of the current player;

and selecting the best child node from at least one corresponding child node as the action executed under the game state data of the current player.

As an improvement of the above solution, the action performed under the game state data of the current player is selected from at least one corresponding child node; the method specifically comprises the following steps:

selecting the child node with the most access times as the optimal child node;

or the like, or, alternatively,

selecting an optimal child node from at least one corresponding child node according to a preset selection probability;

or the like, or, alternatively,

selecting the child node with the maximum confidence value; wherein,

the UCB is a confidence value, N (v) is the number of times of current child node access, N (v') is the number of times of child node access, and E (Q (v)) is the tie profit obtained by the current child node in the whole round of game.

As an improvement of the above scheme, the simulating the intelligent game specifically includes:

and keeping the visible information unchanged, and performing random generation operation on the invisible information.

As an improvement of the above scheme, before the searching for the action corresponding to the game state data of the current player according to the preset monte carlo search tree until the whole round of game is finished and storing the game information corresponding to each action executed by the whole round of game in the database, the method further includes:

and performing back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the income value and the access times corresponding to each current game state.

As an improvement of the above, the method further comprises:

if the action executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which can be executed corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which can be executed is recorded.

As an improvement of the above, the game information includes:

the game state data of the current player and the game state data of the current player correspond to probability labels for executing each action, current game state value labels and game results when the whole round of game is finished.

Another embodiment of the present invention correspondingly provides an intelligent game processing apparatus, including:

the system comprises an acquisition module, a simulation module and a display module, wherein the acquisition module is used for acquiring game state data of a current player of an intelligent game and simulating the intelligent game;

the searching and matching module is used for searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;

the storage module is used for storing game information correspondingly generated after each action executed by the whole round of game to a database;

and the construction module is used for optimizing the loss value functions of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.

Another embodiment of the present invention provides an intelligent game processing device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the intelligent game processing method according to the above embodiment of the present invention.

Another embodiment of the present invention provides a storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the intelligent game processing method described in the above embodiment of the present invention.

Compared with the prior art, the intelligent game processing method, the intelligent game processing device, the intelligent game processing equipment and the intelligent game processing storage medium disclosed by the embodiment of the invention have the advantages that the game state data of the current player of the intelligent game are obtained and simulated, the action correspondingly executed by the game state data of the current player is searched according to the preset Monte Carlo search tree, the game data generated by the correspondingly executed action is taken as the current game state to determine the correspondingly executed action when the next search is carried out until the whole round of game is finished, the game information correspondingly generated after each action executed by the whole round of game is stored in the database, the preset value network and action probability network loss value function are optimized by adopting a back propagation method according to the sampling data obtained from the database, the optimal value network and the optimal action probability network are obtained, the training process of reinforcement learning is simplified in a non-complete information game, the behavior decision of the non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.

Drawings

FIG. 1 is a flow chart of a method for processing an intelligent game according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a preset Monte Carlo search tree simulation according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of an intelligent game processing method according to an embodiment of the present invention.

The embodiment of the invention provides an intelligent game processing method, which comprises the following steps:

and S10, acquiring the game state data of the current player of the intelligent game, and simulating the intelligent game.

Wherein the game state data of the current player includes: the game battle has the advantages that the game battle has the advantages of gate, level, blood volume value, magic value, equipment information, pet carrying information, gain effect and benefit reducing effect of the player and the like. And combining the data into a vector form for input.

In this embodiment, the game state data of the current player may be acquired according to the established game data model, which is only one embodiment of this embodiment, and is not limited to the above embodiment, and may also be another embodiment that can acquire the game state data of the current player.

S20, searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; and the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search.

Referring to fig. 2, the preset monte carlo search tree is: each node of the monte carlo search corresponds to a state in the game, and the child nodes of each node generate new states after taking various actions in the state, wherein each action corresponds to a node. The actions available for each state in the present example include different skill plays by the player, use of items, etc. Using the same action in the same state may also produce different effects due to the randomness of the game, with each child node only being directed to the action taken.

Defining tree node attributes in the Monte Carlo search tree, wherein the node attributes comprise the index of a father node, the indexes of all child nodes, the number of times the node is accessed, the action probability value output by the pre-constructed action probability network to the current node, and the final income value of the game after the node is passed through according to the pre-constructed value network.

In this embodiment, in simulating the same game, a new monte carlo tree can be created again each step (i.e. performing an action), or the previously created monte carlo trees can be multiplexed, and the game data generated by the root node according to the currently actually taken action is taken as the game state data of the current player until the current game is finished, and a new monte carlo tree is created again when the next game is started.

Specifically, when the next correspondingly executed action needs to be determined according to the game state data of the current player, the correspondingly executed action is searched through a preset Monte Carlo tree, the game data generated by the searched corresponding action is used as the game state data of the current player for the next time, and the correspondingly executed action is continuously searched until the simulated intelligent game is finished.

S30, storing game information corresponding to each action data executed by the whole round of game into a database; wherein the game information includes: the game state data of the current player and the game state data of the current player correspond to probability labels for executing various actions and a game state value label of the current player.

In this embodiment, the current game state value tag includes: the win or loss or tie is noted as 1, -1 and 0, respectively. And determining the action probability label, namely taking the action probability label of each child node as the action probability label of each child node according to the ratio of the number of times that each child node is accessed to the number of times that the parent node is accessed after the simulation is finished.

Specifically, game information corresponding to each action data is stored in a database, the database is expanded, training sample diversity is increased, generalization capability is improved, and training is accelerated.

And S40, optimizing the loss value functions of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.

The preset value network and action probability network loss value function is specifically as follows:

L＝αLv+βLp+ηw ^tw, wherein Lv is a loss function of the value network, the coefficient is α, Lp is a loss function of the action probability network, the coefficient is β, and w ^tw is a regularization penalty term with a coefficient of η. the loss function of the value network is the mean square error of the value of a certain game state output in the value network with the actual value label of the state.

Specifically, gradient descent is carried out in a deep learning back propagation mode until convergence, and an optimal value network and an optimal action probability network are obtained.

In summary, by acquiring and simulating game state data of a current player of an intelligent game, searching for an action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree, determining the action correspondingly executed by taking the game data generated by the action correspondingly executed as the current game state when the next search is performed until a whole round of game is finished, storing game information correspondingly generated after each action executed by the whole round of game in a database, optimizing a preset value network and action probability network loss value function by adopting a back propagation method according to sampling data acquired from the database to obtain an optimal value network and an optimal action probability network, simplifying a training process of reinforcement learning in a non-complete information game, enabling behavior decision of a non-player to be more accurate, improving game quality and enhancing playability, the user experience is improved.

and determining a node corresponding to the game state data of the current player.

In this embodiment, the game state data of the current player is used as the root node of the preset Monte Carlo tree.

And searching at least one corresponding child node according to the node corresponding to the game state data of the current player.

In this embodiment, child nodes are accessed layer by layer according to the root node.

And selecting the child node with the most access times as the optimal child node.

Or, selecting the best child node from at least one corresponding child node according to a preset selection probability.

Specifically, the optimal child node is selected from at least one corresponding child node according to the preset selection probability, so that the randomness is increased, and the diversity of training samples is increased.

Or, selecting the child node with the maximum confidence value; wherein,

Illustratively, each simulation round needs to regenerate the game environment, the state sum of the opponent and the state of the opponent remains unchanged, the state of the opponent and the state. In this example, what the other party's status cannot see includes the item in use, some special gains or reductions, and some special skills that may be used. This non-visible information is re-randomly generated each new round of simulation, and the unknown portion of the state that begins each time is re-random.

As an improvement of the above scheme, after the searching for the action data corresponding to the game state data of the current player according to the preset monte carlo search tree is completed until the whole round of game is finished, before the storing the game information corresponding to each action data executed by the whole round of game in the database, the method further includes:

In this embodiment, the information is propagated from the current child node to the root node, the profit value and the access frequency in each node causing the result (winning or losing or tie) are updated according to the last player game state data in a backtracking manner, when the profit value is calculated, the average profit is updated iteratively by adopting a moving average value calculation method, and the higher the profit value is, the higher the potential profit of the player game state data is, according to the judgment made by calculation, the greater the potential profit is.

As an improvement of the above, the method further comprises:

if the action data executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the possibly executed actions corresponding to the game state data of the current player are expanded through a preset neural network, and the probability of each possibly executed action is recorded.

In the embodiment, according to the game state data of the current player, the game state data of the current player is input into a preset neural network, the value of the game state data of the current player and the execution probability of the corresponding available execution action are obtained, the current node is expanded, child nodes which are available actions are obtained, the probability values are recorded in each child node, the corresponding child nodes comprise all operations which can be taken in the current game state, including release of available skills and use of articles, and the probability corresponding to each operation is recorded during expansion.

Fig. 3 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention.

The embodiment of the invention correspondingly provides an intelligent game processing device, which comprises:

the obtaining module 10 is configured to obtain game state data of a current player of an intelligent game, and simulate the intelligent game.

The searching and matching module 20 is configured to search, according to a preset monte carlo search tree, an action that is executed corresponding to the game state data of the current player until the whole round of game is finished; and the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search.

And the storage module 30 is used for storing the game information correspondingly generated after each action executed by the whole round of game to a database.

And the construction module 40 is configured to optimize the loss value functions of the pre-constructed value network and the pre-constructed action probability network by using a back propagation method according to the sampling data acquired from the database, so as to obtain an optimal value network and an optimal action probability network.

To sum up, the intelligent game processing apparatus provided in the embodiments of the present invention obtains and simulates game state data of a current player of an intelligent game, searches for an action correspondingly executed by the game state data of the current player according to a preset monte carlo search tree, determines a correspondingly executed action by using game data generated by the correspondingly executed action as a current game state when performing a next search until a whole round of the game is completed, stores game information correspondingly generated after each action executed by the whole round of the game in a database, optimizes a preset value network and action probability network loss value function by using a back propagation method according to sampling data obtained from the database, obtains an optimal value network and an optimal action probability network, simplifies a training process of reinforcement learning in a non-complete information game, so that a behavior decision of a non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.

Referring to fig. 4, a schematic diagram of an intelligent game processing device according to an embodiment of the present invention is shown. The intelligent game processing device of this embodiment includes: a processor 11, a memory 12 and a computer program stored in said memory and executable on said processor. The processor implements the steps in the above-described respective intelligent game processing method embodiments when executing the computer program. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the intelligent game processing device.

The intelligent game processing device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The smart game processing device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the schematic diagram is merely an example of a smart game processing device and does not constitute a limitation of a smart game processing device, and may include more or fewer components than shown, or some components in combination, or different components, e.g., the smart game processing device may also include input-output devices, network access devices, buses, etc.

The Processor 11 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center for the intelligent game processing device and that connects the various parts of the overall intelligent game processing device using various interfaces and lines.

The memory 12 may be used to store the computer programs and/or modules, and the processor may implement the various functions of the intelligent game processing apparatus by running or executing the computer programs and/or modules stored in the memory, and by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the module/unit integrated with the intelligent game processing device can be stored in a computer readable storage medium if the module/unit is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An intelligent game processing method, comprising:

2. The intelligent game processing method according to claim 1, wherein the action correspondingly executed according to the game state data of the current player is searched according to a preset monte carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search, and the method specifically comprises the following steps:

determining a node corresponding to the game state data of the current player;

3. The intelligent game processing method of claim 2, wherein the selecting of the best child node from at least one of the corresponding child nodes is an action performed under the game state data of the current player; the method specifically comprises the following steps:

selecting the child node with the most access times as the optimal child node;

or the like, or, alternatively,

selecting the child node with the maximum confidence value; wherein,

4. The intelligent game processing method of claim 1, wherein the simulating the intelligent game specifically comprises:

5. The intelligent game processing method according to claim 1, wherein before the step of searching for the action correspondingly executed by the game state data of the current player according to the preset monte carlo search tree until the whole round of game is finished, and saving the game information correspondingly generated after each action executed by the whole round of game in the database, the method further comprises:

and performing back propagation on at least one node corresponding to the current game state data to respectively calculate the income value and the access times corresponding to each current game state.

6. The intelligent game processing method of claim 1, the method further comprising:

7. The intelligent game processing method of claim 1, wherein the game information comprises:

8. An intelligent game processing apparatus, comprising:

9. An intelligent game processing apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the intelligent game processing method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the intelligent game processing method according to any one of claims 1 to 7.