CN110772794A - Intelligent game processing method, device, equipment and storage medium - Google Patents

Intelligent game processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN110772794A
CN110772794A CN201910966855.2A CN201910966855A CN110772794A CN 110772794 A CN110772794 A CN 110772794A CN 201910966855 A CN201910966855 A CN 201910966855A CN 110772794 A CN110772794 A CN 110772794A
Authority
CN
China
Prior art keywords
game
action
state data
game state
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910966855.2A
Other languages
Chinese (zh)
Other versions
CN110772794B (en
Inventor
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD
Multi Benefit Network Co Ltd
Guangzhou Duoyi Network Co Ltd
Original Assignee
GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD
Multi Benefit Network Co Ltd
Guangzhou Duoyi Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD, Multi Benefit Network Co Ltd, Guangzhou Duoyi Network Co Ltd filed Critical GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD
Priority to CN201910966855.2A priority Critical patent/CN110772794B/en
Publication of CN110772794A publication Critical patent/CN110772794A/en
Application granted granted Critical
Publication of CN110772794B publication Critical patent/CN110772794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent game processing method, which comprises the following steps: acquiring game state data of a current player of an intelligent game, and simulating the intelligent game; searching the action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search; storing game information correspondingly generated after each action executed by the whole round of game to a database; and optimizing the loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network. The embodiment of the invention also discloses an intelligent game processing device, equipment and a storage medium, and the problems of inaccurate behavior decision of non-player characters and reduced game quality in the incomplete information game are solved by adopting a plurality of embodiments.

Description

Intelligent game processing method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to an intelligent game processing method, an intelligent game processing device, intelligent game processing equipment and a storage medium.
Background
Players in the turn-based game take turns, and can operate only by turning to the turn, so that the game is strong in interestingness and becomes one of the main entertainment modes of people. The behavior decision of the non-player character in the turn-based game is often an important factor influencing the game quality and the user experience, the artificial intelligence of the traditional game is realized through a state machine or a behavior tree, and the corresponding operation of the non-player character is triggered through different states.
With the success of AlphaGo in the go, the application of reinforcement learning in games is more and more extensive, however, the chess games have the characteristic of complete information, the states of the opposite party and the own party are all public, no hidden information exists, and no random factor exists. In the game of incomplete information, the player can not completely know the information of the opposite side, the effect brought by the invitation is random, a lot of uncertain factors exist, and the training of reinforcement learning is greatly difficult, so that the behavior decision of the non-player character is inaccurate, and the game quality is reduced.
Disclosure of Invention
The embodiment of the invention provides an intelligent game processing method, an intelligent game processing device, intelligent game processing equipment and an intelligent game processing storage medium, which can effectively solve the problems that when a non-complete information game is faced in the prior art, the behavior decision of a non-player character is inaccurate, and the game quality is reduced.
An embodiment of the present invention provides an intelligent game processing method, including:
acquiring game state data of a current player of an intelligent game, and simulating the intelligent game;
searching the action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;
storing game information correspondingly generated after each action executed by the whole round of game to a database;
and optimizing the loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.
As an improvement of the above scheme, the action correspondingly executed by the game state data of the current player is searched according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search, and the method specifically comprises the following steps:
determining a node corresponding to the game state data of the current player;
searching at least one corresponding child node according to the node corresponding to the game state data of the current player;
and selecting the best child node from at least one corresponding child node as the action executed under the game state data of the current player.
As an improvement of the above solution, the action performed under the game state data of the current player is selected from at least one corresponding child node; the method specifically comprises the following steps:
selecting the child node with the most access times as the optimal child node;
or the like, or, alternatively,
selecting an optimal child node from at least one corresponding child node according to a preset selection probability;
or the like, or, alternatively,
selecting the child node with the maximum confidence value; wherein,
Figure BDA0002230763230000021
the UCB is a confidence value, N (v) is the number of times of current child node access, N (v') is the number of times of child node access, and E (Q (v)) is the tie profit obtained by the current child node in the whole round of game.
As an improvement of the above scheme, the simulating the intelligent game specifically includes:
and keeping the visible information unchanged, and performing random generation operation on the invisible information.
As an improvement of the above scheme, before the searching for the action corresponding to the game state data of the current player according to the preset monte carlo search tree until the whole round of game is finished and storing the game information corresponding to each action executed by the whole round of game in the database, the method further includes:
and performing back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the income value and the access times corresponding to each current game state.
As an improvement of the above, the method further comprises:
if the action executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which can be executed corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which can be executed is recorded.
As an improvement of the above, the game information includes:
the game state data of the current player and the game state data of the current player correspond to probability labels for executing each action, current game state value labels and game results when the whole round of game is finished.
Another embodiment of the present invention correspondingly provides an intelligent game processing apparatus, including:
the system comprises an acquisition module, a simulation module and a display module, wherein the acquisition module is used for acquiring game state data of a current player of an intelligent game and simulating the intelligent game;
the searching and matching module is used for searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;
the storage module is used for storing game information correspondingly generated after each action executed by the whole round of game to a database;
and the construction module is used for optimizing the loss value functions of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.
Another embodiment of the present invention provides an intelligent game processing device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the intelligent game processing method according to the above embodiment of the present invention.
Another embodiment of the present invention provides a storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the intelligent game processing method described in the above embodiment of the present invention.
Compared with the prior art, the intelligent game processing method, the intelligent game processing device, the intelligent game processing equipment and the intelligent game processing storage medium disclosed by the embodiment of the invention have the advantages that the game state data of the current player of the intelligent game are obtained and simulated, the action correspondingly executed by the game state data of the current player is searched according to the preset Monte Carlo search tree, the game data generated by the correspondingly executed action is taken as the current game state to determine the correspondingly executed action when the next search is carried out until the whole round of game is finished, the game information correspondingly generated after each action executed by the whole round of game is stored in the database, the preset value network and action probability network loss value function are optimized by adopting a back propagation method according to the sampling data obtained from the database, the optimal value network and the optimal action probability network are obtained, the training process of reinforcement learning is simplified in a non-complete information game, the behavior decision of the non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of a method for processing an intelligent game according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a preset Monte Carlo search tree simulation according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of an intelligent game processing method according to an embodiment of the present invention.
The embodiment of the invention provides an intelligent game processing method, which comprises the following steps:
and S10, acquiring the game state data of the current player of the intelligent game, and simulating the intelligent game.
Wherein the game state data of the current player includes: the game battle has the advantages that the game battle has the advantages of gate, level, blood volume value, magic value, equipment information, pet carrying information, gain effect and benefit reducing effect of the player and the like. And combining the data into a vector form for input.
In this embodiment, the game state data of the current player may be acquired according to the established game data model, which is only one embodiment of this embodiment, and is not limited to the above embodiment, and may also be another embodiment that can acquire the game state data of the current player.
S20, searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; and the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search.
Referring to fig. 2, the preset monte carlo search tree is: each node of the monte carlo search corresponds to a state in the game, and the child nodes of each node generate new states after taking various actions in the state, wherein each action corresponds to a node. The actions available for each state in the present example include different skill plays by the player, use of items, etc. Using the same action in the same state may also produce different effects due to the randomness of the game, with each child node only being directed to the action taken.
Defining tree node attributes in the Monte Carlo search tree, wherein the node attributes comprise the index of a father node, the indexes of all child nodes, the number of times the node is accessed, the action probability value output by the pre-constructed action probability network to the current node, and the final income value of the game after the node is passed through according to the pre-constructed value network.
In this embodiment, in simulating the same game, a new monte carlo tree can be created again each step (i.e. performing an action), or the previously created monte carlo trees can be multiplexed, and the game data generated by the root node according to the currently actually taken action is taken as the game state data of the current player until the current game is finished, and a new monte carlo tree is created again when the next game is started.
Specifically, when the next correspondingly executed action needs to be determined according to the game state data of the current player, the correspondingly executed action is searched through a preset Monte Carlo tree, the game data generated by the searched corresponding action is used as the game state data of the current player for the next time, and the correspondingly executed action is continuously searched until the simulated intelligent game is finished.
S30, storing game information corresponding to each action data executed by the whole round of game into a database; wherein the game information includes: the game state data of the current player and the game state data of the current player correspond to probability labels for executing various actions and a game state value label of the current player.
In this embodiment, the current game state value tag includes: the win or loss or tie is noted as 1, -1 and 0, respectively. And determining the action probability label, namely taking the action probability label of each child node as the action probability label of each child node according to the ratio of the number of times that each child node is accessed to the number of times that the parent node is accessed after the simulation is finished.
Specifically, game information corresponding to each action data is stored in a database, the database is expanded, training sample diversity is increased, generalization capability is improved, and training is accelerated.
And S40, optimizing the loss value functions of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.
The preset value network and action probability network loss value function is specifically as follows:
L=αLv+βLp+ηw tw, wherein Lv is a loss function of the value network, the coefficient is α, Lp is a loss function of the action probability network, the coefficient is β, and w tw is a regularization penalty term with a coefficient of η. the loss function of the value network is the mean square error of the value of a certain game state output in the value network with the actual value label of the state.
Specifically, gradient descent is carried out in a deep learning back propagation mode until convergence, and an optimal value network and an optimal action probability network are obtained.
In summary, by acquiring and simulating game state data of a current player of an intelligent game, searching for an action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree, determining the action correspondingly executed by taking the game data generated by the action correspondingly executed as the current game state when the next search is performed until a whole round of game is finished, storing game information correspondingly generated after each action executed by the whole round of game in a database, optimizing a preset value network and action probability network loss value function by adopting a back propagation method according to sampling data acquired from the database to obtain an optimal value network and an optimal action probability network, simplifying a training process of reinforcement learning in a non-complete information game, enabling behavior decision of a non-player to be more accurate, improving game quality and enhancing playability, the user experience is improved.
As an improvement of the above scheme, the action correspondingly executed by the game state data of the current player is searched according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search, and the method specifically comprises the following steps:
and determining a node corresponding to the game state data of the current player.
In this embodiment, the game state data of the current player is used as the root node of the preset Monte Carlo tree.
And searching at least one corresponding child node according to the node corresponding to the game state data of the current player.
In this embodiment, child nodes are accessed layer by layer according to the root node.
And selecting the best child node from at least one corresponding child node as the action executed under the game state data of the current player.
And selecting the child node with the most access times as the optimal child node.
Or, selecting the best child node from at least one corresponding child node according to a preset selection probability.
Specifically, the optimal child node is selected from at least one corresponding child node according to the preset selection probability, so that the randomness is increased, and the diversity of training samples is increased.
Or, selecting the child node with the maximum confidence value; wherein,
Figure BDA0002230763230000081
the UCB is a confidence value, N (v) is the number of times of current child node access, N (v') is the number of times of child node access, and E (Q (v)) is the tie profit obtained by the current child node in the whole round of game.
As an improvement of the above scheme, the simulating the intelligent game specifically includes:
and keeping the visible information unchanged, and performing random generation operation on the invisible information.
Illustratively, each simulation round needs to regenerate the game environment, the state sum of the opponent and the state of the opponent remains unchanged, the state of the opponent and the state. In this example, what the other party's status cannot see includes the item in use, some special gains or reductions, and some special skills that may be used. This non-visible information is re-randomly generated each new round of simulation, and the unknown portion of the state that begins each time is re-random.
As an improvement of the above scheme, after the searching for the action data corresponding to the game state data of the current player according to the preset monte carlo search tree is completed until the whole round of game is finished, before the storing the game information corresponding to each action data executed by the whole round of game in the database, the method further includes:
and performing back propagation on the nodes corresponding to the game state data of at least one current player to respectively calculate the income value and the access times corresponding to each current game state.
In this embodiment, the information is propagated from the current child node to the root node, the profit value and the access frequency in each node causing the result (winning or losing or tie) are updated according to the last player game state data in a backtracking manner, when the profit value is calculated, the average profit is updated iteratively by adopting a moving average value calculation method, and the higher the profit value is, the higher the potential profit of the player game state data is, according to the judgment made by calculation, the greater the potential profit is.
As an improvement of the above, the method further comprises:
if the action data executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the possibly executed actions corresponding to the game state data of the current player are expanded through a preset neural network, and the probability of each possibly executed action is recorded.
In the embodiment, according to the game state data of the current player, the game state data of the current player is input into a preset neural network, the value of the game state data of the current player and the execution probability of the corresponding available execution action are obtained, the current node is expanded, child nodes which are available actions are obtained, the probability values are recorded in each child node, the corresponding child nodes comprise all operations which can be taken in the current game state, including release of available skills and use of articles, and the probability corresponding to each operation is recorded during expansion.
Fig. 3 is a schematic structural diagram of an intelligent game processing device according to an embodiment of the present invention.
The embodiment of the invention correspondingly provides an intelligent game processing device, which comprises:
the obtaining module 10 is configured to obtain game state data of a current player of an intelligent game, and simulate the intelligent game.
The searching and matching module 20 is configured to search, according to a preset monte carlo search tree, an action that is executed corresponding to the game state data of the current player until the whole round of game is finished; and the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search.
And the storage module 30 is used for storing the game information correspondingly generated after each action executed by the whole round of game to a database.
And the construction module 40 is configured to optimize the loss value functions of the pre-constructed value network and the pre-constructed action probability network by using a back propagation method according to the sampling data acquired from the database, so as to obtain an optimal value network and an optimal action probability network.
To sum up, the intelligent game processing apparatus provided in the embodiments of the present invention obtains and simulates game state data of a current player of an intelligent game, searches for an action correspondingly executed by the game state data of the current player according to a preset monte carlo search tree, determines a correspondingly executed action by using game data generated by the correspondingly executed action as a current game state when performing a next search until a whole round of the game is completed, stores game information correspondingly generated after each action executed by the whole round of the game in a database, optimizes a preset value network and action probability network loss value function by using a back propagation method according to sampling data obtained from the database, obtains an optimal value network and an optimal action probability network, simplifies a training process of reinforcement learning in a non-complete information game, so that a behavior decision of a non-player is more accurate, the game quality is improved, the playability is enhanced, and the user experience is improved.
Referring to fig. 4, a schematic diagram of an intelligent game processing device according to an embodiment of the present invention is shown. The intelligent game processing device of this embodiment includes: a processor 11, a memory 12 and a computer program stored in said memory and executable on said processor. The processor implements the steps in the above-described respective intelligent game processing method embodiments when executing the computer program. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the intelligent game processing device.
The intelligent game processing device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The smart game processing device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the schematic diagram is merely an example of a smart game processing device and does not constitute a limitation of a smart game processing device, and may include more or fewer components than shown, or some components in combination, or different components, e.g., the smart game processing device may also include input-output devices, network access devices, buses, etc.
The Processor 11 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center for the intelligent game processing device and that connects the various parts of the overall intelligent game processing device using various interfaces and lines.
The memory 12 may be used to store the computer programs and/or modules, and the processor may implement the various functions of the intelligent game processing apparatus by running or executing the computer programs and/or modules stored in the memory, and by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the module/unit integrated with the intelligent game processing device can be stored in a computer readable storage medium if the module/unit is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. An intelligent game processing method, comprising:
acquiring game state data of a current player of an intelligent game, and simulating the intelligent game;
searching the action correspondingly executed by the game state data of the current player according to a preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;
storing game information correspondingly generated after each action executed by the whole round of game to a database;
and optimizing the loss value function of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.
2. The intelligent game processing method according to claim 1, wherein the action correspondingly executed according to the game state data of the current player is searched according to a preset monte carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search, and the method specifically comprises the following steps:
determining a node corresponding to the game state data of the current player;
searching at least one corresponding child node according to the node corresponding to the game state data of the current player;
and selecting the best child node from at least one corresponding child node as the action executed under the game state data of the current player.
3. The intelligent game processing method of claim 2, wherein the selecting of the best child node from at least one of the corresponding child nodes is an action performed under the game state data of the current player; the method specifically comprises the following steps:
selecting the child node with the most access times as the optimal child node;
or the like, or, alternatively,
selecting an optimal child node from at least one corresponding child node according to a preset selection probability;
or the like, or, alternatively,
selecting the child node with the maximum confidence value; wherein,
Figure FDA0002230763220000021
the UCB is a confidence value, N (v) is the number of times of current child node access, N (v') is the number of times of child node access, and E (Q (v)) is the tie profit obtained by the current child node in the whole round of game.
4. The intelligent game processing method of claim 1, wherein the simulating the intelligent game specifically comprises:
and keeping the visible information unchanged, and performing random generation operation on the invisible information.
5. The intelligent game processing method according to claim 1, wherein before the step of searching for the action correspondingly executed by the game state data of the current player according to the preset monte carlo search tree until the whole round of game is finished, and saving the game information correspondingly generated after each action executed by the whole round of game in the database, the method further comprises:
and performing back propagation on at least one node corresponding to the current game state data to respectively calculate the income value and the access times corresponding to each current game state.
6. The intelligent game processing method of claim 1, the method further comprising:
if the action executed corresponding to the game state data of the current player cannot be found according to the preset Monte Carlo search tree and the whole round of game is not finished, the action which can be executed corresponding to the game state data of the current player is expanded through a preset neural network, and the probability of each action which can be executed is recorded.
7. The intelligent game processing method of claim 1, wherein the game information comprises:
the game state data of the current player and the game state data of the current player correspond to probability labels for executing each action, current game state value labels and game results when the whole round of game is finished.
8. An intelligent game processing apparatus, comprising:
the system comprises an acquisition module, a simulation module and a display module, wherein the acquisition module is used for acquiring game state data of a current player of an intelligent game and simulating the intelligent game;
the searching and matching module is used for searching the action correspondingly executed by the game state data of the current player according to the preset Monte Carlo search tree until the whole round of game is finished; the game data generated by the corresponding action obtained by each search is used as the game state data of the current player in the next search;
the storage module is used for storing game information correspondingly generated after each action executed by the whole round of game to a database;
and the construction module is used for optimizing the loss value functions of the pre-constructed value network and the pre-constructed action probability network by adopting a back propagation method according to the sampling data acquired from the database to obtain the optimal value network and the optimal action probability network.
9. An intelligent game processing apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the intelligent game processing method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the intelligent game processing method according to any one of claims 1 to 7.
CN201910966855.2A 2019-10-12 2019-10-12 Intelligent game processing method, device, equipment and storage medium Active CN110772794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910966855.2A CN110772794B (en) 2019-10-12 2019-10-12 Intelligent game processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910966855.2A CN110772794B (en) 2019-10-12 2019-10-12 Intelligent game processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110772794A true CN110772794A (en) 2020-02-11
CN110772794B CN110772794B (en) 2023-06-16

Family

ID=69386163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910966855.2A Active CN110772794B (en) 2019-10-12 2019-10-12 Intelligent game processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110772794B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114425166A (en) * 2022-01-27 2022-05-03 北京字节跳动网络技术有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN115914687A (en) * 2022-12-28 2023-04-04 北京奇艺世纪科技有限公司 Video playing and adaptive code rate playing model training method and device
CN117982899A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Data processing method, device, computer, storage medium and program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204412A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Optimal policy determination using repeated stackelberg games with unknown player preferences
JP2016224512A (en) * 2015-05-27 2016-12-28 株式会社日立製作所 Decision support system and decision making support method
CN106390456A (en) * 2016-09-30 2017-02-15 腾讯科技(深圳)有限公司 Generating method and generating device for role behaviors in game
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
CN109908584A (en) * 2019-03-13 2019-06-21 北京达佳互联信息技术有限公司 A kind of acquisition methods of game information, device and electronic equipment
CN110083748A (en) * 2019-04-30 2019-08-02 南京邮电大学 A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning
US10402723B1 (en) * 2018-09-11 2019-09-03 Cerebri AI Inc. Multi-stage machine-learning models to control path-dependent processes

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204412A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Optimal policy determination using repeated stackelberg games with unknown player preferences
JP2016224512A (en) * 2015-05-27 2016-12-28 株式会社日立製作所 Decision support system and decision making support method
CN106390456A (en) * 2016-09-30 2017-02-15 腾讯科技(深圳)有限公司 Generating method and generating device for role behaviors in game
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
US10402723B1 (en) * 2018-09-11 2019-09-03 Cerebri AI Inc. Multi-stage machine-learning models to control path-dependent processes
CN109908584A (en) * 2019-03-13 2019-06-21 北京达佳互联信息技术有限公司 A kind of acquisition methods of game information, device and electronic equipment
CN110083748A (en) * 2019-04-30 2019-08-02 南京邮电大学 A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114425166A (en) * 2022-01-27 2022-05-03 北京字节跳动网络技术有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN115914687A (en) * 2022-12-28 2023-04-04 北京奇艺世纪科技有限公司 Video playing and adaptive code rate playing model training method and device
CN117982899A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Data processing method, device, computer, storage medium and program product

Also Published As

Publication number Publication date
CN110772794B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
US9908052B2 (en) Creating dynamic game activities for games
CN111282267B (en) Information processing method, information processing apparatus, information processing medium, and electronic device
Cazenave et al. Polygames: Improved zero learning
US10272341B1 (en) Procedural level generation for games
CN110772794B (en) Intelligent game processing method, device, equipment and storage medium
CN111569429B (en) Model training method, model using method, computer device, and storage medium
CN109011580B (en) Incomplete game card face obtaining method and device, computer equipment and storage medium
Zhang et al. AlphaZero
CN110598853B (en) Model training method, information processing method and related device
CN113946604B (en) Staged go teaching method and device, electronic equipment and storage medium
KR101484053B1 (en) Apparatus and method for supporting creation of quest stories for mmorpg
Norton et al. Monsters of Darwin: A strategic game based on artificial intelligence and genetic algorithms
Zhou et al. Discovering of game AIs’ characters using a neural network based AI imitator for AI clustering
CN109948062A (en) A kind of target matching method, device, server, system and storage medium
Winget Collecting the artifacts of participation: Videogame players, fan-boys, and individual models of collection
Thawonmas et al. Identification of player types in massively multiplayer online games
Franco et al. Generating Rooms using Generative Grammars and Genetic Algorithms
Tavares et al. Towards sample efficient deep reinforcement learning in collectible card games
Couëtoux et al. Monte carlo tree search in go
Kwon et al. Clustering player behavioral data and improving performance of churn prediction from mobile game
CN110427477B (en) Heuristic questioning method and device for story machine
Tiippana The future of interactive literature
WO2024060914A1 (en) Virtual object generation method and apparatus, device, medium, and program product
Kołodziejek et al. Procedural generation of game levels based on a genetic algorithm and taking the player’s experience into account
Batar Word-based game development on Android with an efficient graphical data structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant