CN110489668A - Synchronous game monte carlo search sets mutation method more under non-complete information - Google Patents
Synchronous game monte carlo search sets mutation method more under non-complete information Download PDFInfo
- Publication number
- CN110489668A CN110489668A CN201910860992.8A CN201910860992A CN110489668A CN 110489668 A CN110489668 A CN 110489668A CN 201910860992 A CN201910860992 A CN 201910860992A CN 110489668 A CN110489668 A CN 110489668A
- Authority
- CN
- China
- Prior art keywords
- movement
- game
- player
- opponent
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/55—Controlling game characters or game objects based on the game progress
- A63F13/58—Controlling game characters or game objects based on the game progress by computing conditions of game characters, e.g. stamina, strength, motivation or energy level
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/70—Game security or game management aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
- G06F16/24566—Recursive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/64—Methods for processing data by generating or executing the game program for computing dynamical parameters of game objects, e.g. motion determination or computation of frictional forces for a virtual car
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/65—Methods for processing data by generating or executing the game program for computing the condition of a game character
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- General Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses synchronous game monte carlo searches under a kind of non-complete information to set mutation method more, comprising: S1: will speculate remaining information according to known posterior infromation for the strategy of player;S2: all information are sampled before game theory expansion, screen fair play;S3: after game-tree search, the search result of each game theory is trained again, it predicts final dominating stragegy: S4: two game theories is arranged according to different players, it is wherein to interknit between the game theory of different players, multiple game theories are unfolded in every wheel simultaneously, sample content before expansion is identical, and each starting to spread out movement from player's angle is the information collection according to oneself, directly sets mapping according to other for the movement of opponent and obtains.
Description
Technical field
The present invention relates to game monte carlo search synchronous under game playing by machine technical field more particularly to non-complete information is more
Set mutation method.
Background technique
How game playing by machine research, which allows the computer simulation mankind to carry out game, is played chess, and is that artificial intelligence field is most challenging
One of research direction.Many famous scholars once set foot in the research field, such as father's von Neumann (Von of computer
Meunann), father John mccarthy (John McCarthy) of artificial intelligence, information theory founder's Shannon
(C.E.Shannon), cybernetics founder wiener (Norbert Wiener) and famous computer scholar A Lantuling
(A.Turing) etc..Game playing by machine be to the abstract of mankind's game and refining, be it is simple and convenient, economical and practical, and rich connotation,
Varied logical thinking Study of Support provides an ideal experimental bed for artificial intelligence, is known as " artificial intelligence
The drosophila of energy ".In addition to theory significance, game playing by machine is also with a wide range of applications, and especially advises in war simulation, city
It draws, the fields such as network security.However, how to realize the intelligence of game decision-making,.It is rich that the solution of these problems is dependent on machine
Play chess the theoretical development with technology.
Game-tree search technology is to solve for the maximally efficient method of game playing by machine problem, i.e., News Search is most in game theory
Good path, to reach comprehensive income maximization.However, the game theory scale of practical problem of game is very huge, lead to game theory
Optimizing is extremely difficult, if chess game theory complexity is 10123, the game theory complexity of go is up to 10360, and on the earth
The number of whole atoms just has 10 according to estimates132.In addition, the missing of opponent's information makes game burl in non-perfect information game
Dotted state height is uncertain, and game theory expansion and solution is caused to become more difficult.In short, the game playing by machine under complex environment has
The features such as stateful space is big, information is unknown, action income is uncertain, although having broader practice prospect, also faces
Huge challenge.
Monte Carlo tree search based on sampling is mainly used for solving the problems, such as the non-perfect information game of higher complexity.It is right
Hand modeling be also non-perfect information game important research content, in non-perfect game playing by machine, the status information of opponent with it is right
There is very big connection between hand behavior.It predicts opponent's state, behavior etc., deflated state space, drop by establishing opponent model
Low information uncertainty.
The research of current non-perfect information game focuses primarily upon board class problem, is mostly asked using what refining and equilibrium were sought
Solution method has a disadvantage in that and is unable to get optimal policy when other side deviates balance policy or cheating, and is only limitted to double zero He
Game, for n-person game, cooperative game, the problems such as synchronous game existing algorithm there are still many deficiencies.So using
More trees model it by the angle of different players, and according in gambling process observation information and hiding information to knowledge
It practises and extracts, screen effective information, supplement the loss of learning under non-perfect information, opponent's state and decision are effectively estimated
And prediction, the structure of synchronous monte carlo search mutation under non-perfect information is improved, the strategy of game theory is supplemented.
Summary of the invention
According to problem of the existing technology, the invention discloses synchronous game Monte Carlos under a kind of non-complete information to search
Suo Duoshu mutation method, specifically comprises the following steps:
S1: remaining information will be speculated according to known posterior infromation for the strategy of player, screen fair play, then will be complete
U.S. information game opponent strategy estimation mode is transferred in the information in non-perfect information game speculating and observing, in search
The habitual movement of opponent under each state is recorded outside, establishment strategy auxiliary function;
S2: all information are sampled before game theory expansion, screen fair play: by opponent in game before
The movement executed in journey is recorded, according to actual needs given threshold, and the movement income in the threshold value is screened, right
Player and the higher movement of opponent's income are marked, and establish an action message library and store:
S3: after game-tree search, the search result of each game theory being trained again, predicts final advantage plan
Slightly: the result of search is combined, by these from different perspectives the game theory of player and different sampling actions result carry out
Compare, the end value of all game theory solving result tendencies of final reaction is chosen using convergent Decision Method;
S4: being arranged two game theories according to different players, wherein interknited between the game theory of different players,
Every these game theories of wheel are unfolded simultaneously, and the sample content before expansion is identical, and each starting to spread out movement from player's angle is basis
The information collection of oneself, for opponent movement directly according to other tree mapping come, be between player and the game theory of opponent
Line search, communication with one another affect one another.The purpose for the arrangement is that in order to guarantee that the movement of different players executes synchronization, every time
The characteristics of state transfer after execution movement is identical, is bonded synchronous game.
Described to sample before game theory expansion to all information, screening fair play specifically includes: sample phase, choosing
It is as follows to select stage, extension phase, dummy run phase and more new stage, specific embodiment:
Sample phase: expansion first carries out grab sample to these movements from information bank every time, only samples run,
Game theory is unfolded again;The movement type and quantity wherein sampled before expansion every time carry out at random, wherein sample every time
Sample size is identical;
Choice phase: according to the game theory of the angle of different players, after the completion of sample phase, according to the movement after screening
Information starts to select, and each player is the information concentration selection movement that upper node is set from oneself, and then for the movement of opponent
It is to come from the result mapping after the selection in other trees;
Extension phase: the generation movement transfer after both sides' player actions are carried out;
Dummy run phase: the game of respective player's angle is: after generating state transfer while simulating, each tree is only right
Valuation is carried out with this tree for the player actions of angle;
The more new stage: treating each tree will be recalled the access times of movement income and movement after valuation more
Newly.
Further, the prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two, (i.e. sampling rank
Section estimates the selection situation of each movement, predicts opponent's tactics)
S31: being from player's angle first using priori knowledge, moving by the select probability of each movement and in this state
Make probability product as molecule, and the probability that each state is occurred is calculated and selected under each player's particular state as denominator
The probability of some movement;
S32: opponent selects most movements from opponent's angular observation prior actions, this kind of movement is known as being accustomed to dynamic
Make, for this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit;
S33: determine that player should from the self benefits and priori knowledge of player angle combination player for some state
The movement of selection;
The prediction of the decision and estimation are specific in the following way: set the prior probability that P (a) is appearance movement a, P (s |
A) be in previous game bout appearance movement a shape probability of state, by the two product normalization after, choose this normalization after
Prior probability and movement income U (si, a) maximum value combined, this maximum value combines priori knowledge and movement income, to object for appreciation
This is more to trust this movement to a kind of trust value of some movement for family, then the probability for choosing this movement is bigger.Because
The movement of player and opponent are correspondingFor corresponding relative motion, in gambling process
Movement number be equal N (aj)=N (ai), so only needing to calculate the most common movement of opponent in preceding game bout
, it is directly added into an adjusting parameter λ in habitual movement and movement income, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | it is the movement number in player actions set, tactful formula is dynamic to opponent's habit
Make the mixing that the trust with player acts.
This method is solved for non-perfect synchronization problem of game and is more of practical significance, using more trees to different players
It is modeled, remains mapping relations mutually between these trees, the sampling action information in maneuver library is screened fair play, kept away
It is too big motion space is exempted from, caused game-tree search is difficult, and inefficiency solves the problems such as of low quality.In game theory
The characteristics of mapping in search process between game theory ensure that the synchronization of state transfer, perfectly be bonded synchronous game.No
Only in this way, carrying out search finding after different samplings in game theory, these solving results are compared and are screened, used
Convergent Decision Method chooses the end value of all game theory solving results tendencies of final reaction, ensure that the accuracy and rationally of result
Property, it will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, make the choosing of player in final gambling process
Strategy is selected, the various actions of execution are more reasonable.The invention simultaneously is not player in two level estimated informations of opponent and player
It is not single according to income selection movement, because both sides player a kind of will not choose dynamic according to income in practical gambling process
Make, also habitual movement can be generated because of artificially to the hobby of certain movements, so needing to take into account the habitual movement of opponent
It goes, under a certain state, the frequency that opponent acts if there is some is high, and representing this movement is largely preference movement
Or habitual movement, income and habitual movement are taken into account simultaneously in decision, which greatly enhances the accuracys and spirit of strategy
Activity.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is that non-perfect synchronizing information Monte Carlo tree searches for mutation technical solution figure;
Fig. 2 is the game knowledge extractive technique route map based on sampling;
Fig. 3 is non-perfect synchronizing information monte carlo search schematic diagram;
Fig. 4 is opponent's parsing action modeling figure under non-perfect information.
Specific embodiment
To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this
Technical solution in inventive embodiments carries out clear and complete description:
Synchronous game monte carlo search sets mutation method, this method pair more under the non-complete information of one kind disclosed by the invention
The information that the partial information of observable and player deliberately hide in non-perfect synchronizing information game, extracts, is added to game theory
Middle auxiliary policy selection.It makes up non-perfect information and is known due to information and is not difficult to entirely to its direct solution.Reservation synchronizes rich simultaneously
The characteristics of playing chess is transformed tree, makes the tree established with different player's angles, and search process is synchronous to be carried out, and specifically includes following
Step:
Step 1-1: for the strategy of player, remaining information will be speculated according to known posterior infromation, diminution acts model
It encloses, screens fair play, in order to reduce the search scale of game theory, then perfect information game opponent's strategy estimation mode shifted
In the information for speculating and observing into non-perfect information game.Except search to each state under opponent habitual movement into
Row record, establishment strategy auxiliary function.
Step 1-2: for the information observed and guessed in gambling process, opponent is done using error message in order to prevent
Disturb, thus estimation opponent's information in hiding information need to do it is anti-interference, remove mistake information, retain real information.Side
Method are as follows: all information are sampled before game theory expansion, screen fair play.Opponent is held in gambling process before
Capable movement is recorded, and sets a threshold value according to actual needs, the movement income in this threshold value is screened, right
Player and the higher movement of opponent's income are marked, and establish an action message library, store these information, for game theory pair
The screening of fair play is prepared.
Step 1-3: after game-tree search, the search result of each game theory is trained again, is predicted final
Dominating stragegy.The result of search is combined, by these from different player's angles and the result of the tree of different sampling actions into
Row compares, and the end value of all game theory solving result tendencies of final reaction is chosen using convergent Decision Method, ensure that result
Accuracy and reasonability, will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, makes final game
The selection strategy of player in journey, the various actions of execution are more reasonable.
Step 1-4: the foundation of game theory is the improvement in traditional Monte Carlo tree, is arranged two according to different players
It sets, is interknited between the game theory of different players, these set while being unfolded every wheel, and the sample content before expansion is identical
's.Each starting to spread out movement from player's angle is the information collection according to oneself, and the movement of opponent is directly set according to other
Mapping comes, and is on-line search between player and the game theory of opponent, and communication with one another affects one another.The purpose for the arrangement is that
In order to guarantee that the movement of different players executes synchronization, the state transfer after each execution movement is identical, is bonded the spy of synchronous game
Point.
Step 2-1: sample phase, expansion first carries out grab sample to these movements from information bank every time, only sampling one
Partial act, then game theory is unfolded.The movement type and quantity sampled before expansion every time are all random, but need to protect
Demonstrate,proving the sample size sampled every time is identical, being consistent property.So multiple sample, these samples are put back to after sampling
This, in order to set the use of information in the pre-deployed next time.Movement is sampled before this each game-tree search and is put back to again.
It then samples when second of search, once analogizes, after recycling n times, because the information for sampling some every time is not sample
, this is to retain certain flexibility, and the data that this part does not sample prevent the solution of search excessively mechanical and inflexible, retains one
Partial information is that other states occurred in actual player gambling process are prepared.
Further, step 2-2: the choice phase completes according to the game theory of the angle of different players in sample phase
Afterwards, according to the action message after screening, start to select.Each player is to concentrate selection dynamic from the information of the upper node of oneself tree
Make, and for the movement of opponent is come from the result mapping after the selection in other trees.
Step 2-3: extension phase, the generation movement transfer after both sides' player actions are carried out, due to mutual between movement
Mapping, and both sides act generation movement transfer after being carried out, so state each other is consistent, ensure that movement is same
Step, because they successive are not influenced by executing before state does not change.
Step 2-4: the game of dummy run phase, respective player's angle are after generating state transfer, while to be simulated,
Each tree only carries out valuation to this tree for the player actions of angle.
Step 2-5: the more new stage, treat each tree will by the access times of movement income and movement after valuation,
Carry out backtracking update.For these processes, it is completed at the same time between tree.
Further, before game decision-making, pre-estimate opponent's status and predict opponent's action strategy, i.e., into
Row Opponent Modeling.The prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two.
Step 3-1: being from player's angle, using priori knowledge, by the select probability of each movement and in the state first
Under movement probability product as molecule, and the probability that each state is occurred calculates each player's particular state as denominator
The probability of lower some movement of selection.Because can consider to act the dynamic of Income Maximum under oneself corresponding states for player's angle
Make, can also consider which should be selected act in Heuristics under some state, if select probability is maximum in Heuristics
The movement for acting while being also Income Maximum, probably selects the movement from player for the angle of player.
Step 3-2: from the point of view of opponent's angle, we can observe opponent in prior actions and select most movements, this kind of
Movement is known as habitual movement, and for this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit, because with
Corresponding movement number and the habitual movement number of opponent be equal, it is possible to the movement that execute is speculated according to quantity.
Step 3-3: be directed to some state, for player, in conjunction with player self benefits and priori knowledge in player answer
The movement of the selection.But in practical gambling process, opponent is possible without acting according to income selection for machinery, there is also
The habitual movement of oneself considers that habitual movement is to retain the variation of certain ratio reply movement for opponent.
Step 3-4: circular are as follows: P (a) is the prior probability of appearance movement a, and P (s | a) is previous game bout
The shape probability of state of middle appearance movement a.After the normalization of the two product, prior probability and movement after choosing this normalization are received
Beneficial U (si, a) combine maximum value, this maximum value combine priori knowledge and movement income, this is to some for player
A kind of trust value of movement, more trusts this movement, then the probability for choosing this movement is bigger.Because of the movement of player and opponent
It is correspondingFor corresponding relative motion, the movement number in gambling process is equal
N(aj)=N (ai), so only need to calculate the most common movement of opponent in preceding game bout, it can be dynamic in habit
Make and movement income be directly added into an adjusting parameter λ, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | it is the movement number in player actions set, tactful formula is dynamic to opponent's habit
Make the mixing that the trust with player acts.
Existing synchronous problem of game is the searching method under perfect information mostly, but in practical game, more
Synchronous game be it is non-perfect, this method is solved for non-perfect synchronization problem of game and is more of practical significance, and uses more
Tree models different players, remains mapping relations mutually between these trees, the sampling action information in maneuver library, screening
Fair play, it is too big to avoid motion space, and caused game-tree search is difficult, and inefficiency solves the problems such as of low quality.
It ensure that the synchronization of state transfer in the mapping in the search process of game theory between game theory, be perfectly bonded synchronous game
The characteristics of.Moreover, search finding is carried out after different samplings in game theory, these solving results are compared and
Screening is chosen the end value of all game theory solving result tendencies of final reaction using convergent Decision Method, ensure that the standard of result
True property and reasonability, will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, makes final gambling process
The various actions of the selection strategy of middle player, execution are more reasonable.The invention simultaneously is in two level estimation letters of opponent and player
It is not single according to income selection movement that breath, which is not player, because both sides player will not a kind of basis in practical gambling process
Income selection movement can also generate habitual movement because of artificially to the hobby of certain movements, so needing the habit of opponent
Movement is taken into account, and under a certain state, the frequency that opponent acts if there is some is high, and representing this movement is largely
It is preference movement or habitual movement, takes into account income and habitual movement simultaneously in decision, which greatly enhances strategies
Accuracy and flexibility.
Embodiment:
Fig. 1 is overall technical solution figure;Non-perfect synchronizing information problem of game includes carrying out to the gambling process of player
Modeling, carries out expansion solution with each player's angle, then on the basis of this, and non-perfect information game monte carlo search sets mutation more
Possess sampling, selection, extension, simulation, update;And tree with tree search be it is inter-related, synchronization.It is right before search
Key message extracts, and carries out pre-estimation to opponent.It is solved in the search for uncertain problem.
It is screening fair play first first, reduces motion space, the action message after screening is sampled.Such as Fig. 2, enter
Search process, then the search result by the search mutation of multiple Monte Carlo trees is combined, and elects most representative ask
Solution value.
The specific search process of game theory, such as Fig. 3 is shown, and player and opponent sample from movement information bank respectively,
Motion space is reduced, so as to the expansion of game theory.Player and opponent select according to the information of sampling, it is assumed that player executes
A1 is acted, meanwhile, opponent's execution acts b1, and after having executed respective movement respectively, tree 1 will execute two trees from tree 2
Movement b1 mapping come, similarly, a1 2 is mapped by tree from another one tree, and on-line search in this improves the effect of search
Rate.After two trees have been carried out the movement of both sides, completion status transfer calculates income, remains synchronous feature.
Due in non-perfect information, the characteristics of loss of learning, need to estimate opponent, establishment strategy is promoted, and is retained
Reflect the useful information of opponent's state rule, and opponent's behavior is estimated according to key message.As shown in figure 4, player's basis first
The Heuristics of oneself considers decision scheme to history game state and the movement income of oneself, considers further that the angle of opponent,
Observe opponent's information, thus it is speculated that Behavior preference opponent has.Two aspects combine, and estimate opponent, construct selection strategy formula.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (4)
1. synchronous game monte carlo search sets mutation method more under a kind of non-complete information characterized by comprising
S1: remaining information will be speculated according to known posterior infromation for the strategy of player, screen fair play, then perfection is believed
Breath game opponent strategy estimation mode is transferred in the information in non-perfect information game speculating and observing, right except search
The habitual movement of opponent records under each state, establishment strategy auxiliary function;
S2: all information are sampled before game theory expansion, screen fair play: by opponent in gambling process before
The movement of execution is recorded, according to actual needs given threshold, the movement income in the threshold value is screened, to player
High movement is marked with opponent's income, establishes an action message library and stores:
S3: after game-tree search, the search result of each game theory being trained again, predicts final dominating stragegy: will
The result of search is combined, and the search result of these game theories of player and different sampling actions from different perspectives is compared
Compared with the end value for using all game theory solving results of convergent Decision Method selection final reaction to be inclined to;
S4: being arranged two game theories according to different players, wherein be to interknit between the game theory of different players, every wheel pair
Multiple game theories are unfolded simultaneously, and the sample content before expansion is identical, and each starting to spread out movement from player's angle is according to oneself
Information collection, for opponent movement directly according to other set mapping obtain.
2. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 1
Also reside in: described to sample before game theory expansion to all information, screening fair play specifically includes: sample phase, choosing
It is as follows to select stage, extension phase, dummy run phase and more new stage, specific embodiment:
Sample phase: expansion first carries out grab sample to these movements from information bank every time, only samples run, then right
Game theory is unfolded;The movement type and quantity wherein sampled before expansion every time carry out at random, wherein the sample sampled every time
Quantity is identical;
Choice phase: according to the game theory of the angle of different players, after the completion of sample phase, according to the action message after screening
Start to select, each player be the information of upper node is set from oneself to concentrate selection movement, and for the movement of opponent be then from
Result mapping after selection in other trees comes;
Extension phase: the generation movement transfer after both sides' player actions are carried out;
Dummy run phase: the game of respective player's angle is: after generating state transfer while simulating, each tree is only to this
Tree be angle player actions carry out valuation;
The more new stage: backtracking update will be carried out for the access times of movement income and movement after valuation by treating each tree.
3. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 1
It also resides in: the prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two:
S31: being from player's angle first using priori knowledge, movement by the select probability of each movement and in this state is general
Rate product is as molecule, and the probability that each state is occurred calculates as denominator and selects some under each player's particular state
The probability of movement;
S32: opponent selects most movements from opponent's angular observation prior actions, and this kind of movement is known as habitual movement, needle
To this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit;
S33: determine that player should select from the self benefits and priori knowledge of player angle combination player for some state
Movement.
4. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 3
It also resides in:
The prediction and estimation of the decision are specific in the following way:
If P (a) is the prior probability of appearance movement a, P (s | a) is the shape probability of state of appearance movement a in previous game bout,
Prior probability and movement income U (s after the normalization of the two product, after choosing this normalizationi, a) maximum value combined, such as
Income U (the s of some movement of fruiti, a) the maximum probability for then choosing the movement is bigger;Calculate in preceding game bout opponent most
The movement often occurred is directly added into an adjusting parameter λ in habitual movement and movement income, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | be the movement number in player actions set, tactful formula be to opponent's habitual movement and
The mixing of the trust movement of player.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910860992.8A CN110489668A (en) | 2019-09-11 | 2019-09-11 | Synchronous game monte carlo search sets mutation method more under non-complete information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910860992.8A CN110489668A (en) | 2019-09-11 | 2019-09-11 | Synchronous game monte carlo search sets mutation method more under non-complete information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489668A true CN110489668A (en) | 2019-11-22 |
Family
ID=68557628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910860992.8A Pending CN110489668A (en) | 2019-09-11 | 2019-09-11 | Synchronous game monte carlo search sets mutation method more under non-complete information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489668A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111905373A (en) * | 2020-07-23 | 2020-11-10 | 深圳艾文哲思科技有限公司 | Artificial intelligence decision method and system based on game theory and Nash equilibrium |
CN112463992A (en) * | 2021-02-04 | 2021-03-09 | 中至江西智能技术有限公司 | Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field |
CN116039957A (en) * | 2022-12-30 | 2023-05-02 | 哈尔滨工业大学 | Spacecraft online game planning method, device and medium considering barrier constraint |
CN116039956A (en) * | 2022-11-02 | 2023-05-02 | 哈尔滨工业大学 | Spacecraft sequence game method, device and medium based on Monte Carlo tree search |
-
2019
- 2019-09-11 CN CN201910860992.8A patent/CN110489668A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111905373A (en) * | 2020-07-23 | 2020-11-10 | 深圳艾文哲思科技有限公司 | Artificial intelligence decision method and system based on game theory and Nash equilibrium |
CN112463992A (en) * | 2021-02-04 | 2021-03-09 | 中至江西智能技术有限公司 | Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field |
CN112463992B (en) * | 2021-02-04 | 2021-06-11 | 中至江西智能技术有限公司 | Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field |
CN116039956A (en) * | 2022-11-02 | 2023-05-02 | 哈尔滨工业大学 | Spacecraft sequence game method, device and medium based on Monte Carlo tree search |
CN116039956B (en) * | 2022-11-02 | 2023-11-14 | 哈尔滨工业大学 | Spacecraft sequence game method, device and medium based on Monte Carlo tree search |
CN116039957A (en) * | 2022-12-30 | 2023-05-02 | 哈尔滨工业大学 | Spacecraft online game planning method, device and medium considering barrier constraint |
CN116039957B (en) * | 2022-12-30 | 2024-01-30 | 哈尔滨工业大学 | Spacecraft online game planning method, device and medium considering barrier constraint |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489668A (en) | Synchronous game monte carlo search sets mutation method more under non-complete information | |
Ye et al. | Towards playing full moba games with deep reinforcement learning | |
Winands et al. | Monte Carlo tree search in lines of action | |
Samothrakis et al. | Fast approximate max-n monte carlo tree search for ms pac-man | |
CN108985458A (en) | A kind of double tree monte carlo search algorithms of sequential synchronous game | |
Gaina et al. | Population seeding techniques for rolling horizon evolution in general video game playing | |
Ramanujan et al. | Understanding sampling style adversarial search methods | |
Schauenberg | Opponent modelling and search in poker | |
Baier et al. | Guiding multiplayer MCTS by focusing on yourself | |
CN109002893A (en) | A kind of sequential synchronous sequence monte carlo search algorithm | |
Zhang et al. | AlphaZero | |
Heinrich et al. | Self-play Monte-Carlo tree search in computer poker | |
Barthet et al. | Go-blend behavior and affect | |
CN110727870A (en) | Novel single-tree Monte Carlo search method for sequential synchronous game | |
Fu | Markov decision processes, AlphaGo, and Monte Carlo tree search: Back to the future | |
Dobre et al. | Online learning and mining human play in complex games | |
Szczepański et al. | Case-based reasoning for improved micromanagement in Real-time strategy games. | |
Schadd et al. | Addressing NP-complete puzzles with Monte-Carlo methods | |
Maes et al. | Monte carlo search algorithm discovery for single-player games | |
Schaeffer et al. | Learning to play strong poker | |
Leece et al. | Sequential pattern mining in Starcraft: Brood War for short and long-term goals | |
Dobre et al. | Exploiting action categories in learning complex games | |
Liu et al. | An improved minimax-Q algorithm based on generalized policy iteration to solve a Chaser-Invader game | |
Gaina et al. | Project Thyia: A forever gameplayer | |
Ameneyro et al. | Playing carcassonne with monte carlo tree search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |