CN110489668A - Synchronous game monte carlo search sets mutation method more under non-complete information - Google Patents

Synchronous game monte carlo search sets mutation method more under non-complete information Download PDF

Info

Publication number
CN110489668A
CN110489668A CN201910860992.8A CN201910860992A CN110489668A CN 110489668 A CN110489668 A CN 110489668A CN 201910860992 A CN201910860992 A CN 201910860992A CN 110489668 A CN110489668 A CN 110489668A
Authority
CN
China
Prior art keywords
movement
game
player
opponent
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910860992.8A
Other languages
Chinese (zh)
Inventor
潘家鑫
黄湛钧
高庆龙
王骄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910860992.8A priority Critical patent/CN110489668A/en
Publication of CN110489668A publication Critical patent/CN110489668A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/58Controlling game characters or game objects based on the game progress by computing conditions of game characters, e.g. stamina, strength, motivation or energy level
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24566Recursive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/64Methods for processing data by generating or executing the game program for computing dynamical parameters of game objects, e.g. motion determination or computation of frictional forces for a virtual car
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/65Methods for processing data by generating or executing the game program for computing the condition of a game character

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses synchronous game monte carlo searches under a kind of non-complete information to set mutation method more, comprising: S1: will speculate remaining information according to known posterior infromation for the strategy of player;S2: all information are sampled before game theory expansion, screen fair play;S3: after game-tree search, the search result of each game theory is trained again, it predicts final dominating stragegy: S4: two game theories is arranged according to different players, it is wherein to interknit between the game theory of different players, multiple game theories are unfolded in every wheel simultaneously, sample content before expansion is identical, and each starting to spread out movement from player's angle is the information collection according to oneself, directly sets mapping according to other for the movement of opponent and obtains.

Description

Synchronous game monte carlo search sets mutation method more under non-complete information
Technical field
The present invention relates to game monte carlo search synchronous under game playing by machine technical field more particularly to non-complete information is more Set mutation method.
Background technique
How game playing by machine research, which allows the computer simulation mankind to carry out game, is played chess, and is that artificial intelligence field is most challenging One of research direction.Many famous scholars once set foot in the research field, such as father's von Neumann (Von of computer Meunann), father John mccarthy (John McCarthy) of artificial intelligence, information theory founder's Shannon (C.E.Shannon), cybernetics founder wiener (Norbert Wiener) and famous computer scholar A Lantuling (A.Turing) etc..Game playing by machine be to the abstract of mankind's game and refining, be it is simple and convenient, economical and practical, and rich connotation, Varied logical thinking Study of Support provides an ideal experimental bed for artificial intelligence, is known as " artificial intelligence The drosophila of energy ".In addition to theory significance, game playing by machine is also with a wide range of applications, and especially advises in war simulation, city It draws, the fields such as network security.However, how to realize the intelligence of game decision-making,.It is rich that the solution of these problems is dependent on machine Play chess the theoretical development with technology.
Game-tree search technology is to solve for the maximally efficient method of game playing by machine problem, i.e., News Search is most in game theory Good path, to reach comprehensive income maximization.However, the game theory scale of practical problem of game is very huge, lead to game theory Optimizing is extremely difficult, if chess game theory complexity is 10123, the game theory complexity of go is up to 10360, and on the earth The number of whole atoms just has 10 according to estimates132.In addition, the missing of opponent's information makes game burl in non-perfect information game Dotted state height is uncertain, and game theory expansion and solution is caused to become more difficult.In short, the game playing by machine under complex environment has The features such as stateful space is big, information is unknown, action income is uncertain, although having broader practice prospect, also faces Huge challenge.
Monte Carlo tree search based on sampling is mainly used for solving the problems, such as the non-perfect information game of higher complexity.It is right Hand modeling be also non-perfect information game important research content, in non-perfect game playing by machine, the status information of opponent with it is right There is very big connection between hand behavior.It predicts opponent's state, behavior etc., deflated state space, drop by establishing opponent model Low information uncertainty.
The research of current non-perfect information game focuses primarily upon board class problem, is mostly asked using what refining and equilibrium were sought Solution method has a disadvantage in that and is unable to get optimal policy when other side deviates balance policy or cheating, and is only limitted to double zero He Game, for n-person game, cooperative game, the problems such as synchronous game existing algorithm there are still many deficiencies.So using More trees model it by the angle of different players, and according in gambling process observation information and hiding information to knowledge It practises and extracts, screen effective information, supplement the loss of learning under non-perfect information, opponent's state and decision are effectively estimated And prediction, the structure of synchronous monte carlo search mutation under non-perfect information is improved, the strategy of game theory is supplemented.
Summary of the invention
According to problem of the existing technology, the invention discloses synchronous game Monte Carlos under a kind of non-complete information to search Suo Duoshu mutation method, specifically comprises the following steps:
S1: remaining information will be speculated according to known posterior infromation for the strategy of player, screen fair play, then will be complete U.S. information game opponent strategy estimation mode is transferred in the information in non-perfect information game speculating and observing, in search The habitual movement of opponent under each state is recorded outside, establishment strategy auxiliary function;
S2: all information are sampled before game theory expansion, screen fair play: by opponent in game before The movement executed in journey is recorded, according to actual needs given threshold, and the movement income in the threshold value is screened, right Player and the higher movement of opponent's income are marked, and establish an action message library and store:
S3: after game-tree search, the search result of each game theory being trained again, predicts final advantage plan Slightly: the result of search is combined, by these from different perspectives the game theory of player and different sampling actions result carry out Compare, the end value of all game theory solving result tendencies of final reaction is chosen using convergent Decision Method;
S4: being arranged two game theories according to different players, wherein interknited between the game theory of different players, Every these game theories of wheel are unfolded simultaneously, and the sample content before expansion is identical, and each starting to spread out movement from player's angle is basis The information collection of oneself, for opponent movement directly according to other tree mapping come, be between player and the game theory of opponent Line search, communication with one another affect one another.The purpose for the arrangement is that in order to guarantee that the movement of different players executes synchronization, every time The characteristics of state transfer after execution movement is identical, is bonded synchronous game.
Described to sample before game theory expansion to all information, screening fair play specifically includes: sample phase, choosing It is as follows to select stage, extension phase, dummy run phase and more new stage, specific embodiment:
Sample phase: expansion first carries out grab sample to these movements from information bank every time, only samples run, Game theory is unfolded again;The movement type and quantity wherein sampled before expansion every time carry out at random, wherein sample every time Sample size is identical;
Choice phase: according to the game theory of the angle of different players, after the completion of sample phase, according to the movement after screening Information starts to select, and each player is the information concentration selection movement that upper node is set from oneself, and then for the movement of opponent It is to come from the result mapping after the selection in other trees;
Extension phase: the generation movement transfer after both sides' player actions are carried out;
Dummy run phase: the game of respective player's angle is: after generating state transfer while simulating, each tree is only right Valuation is carried out with this tree for the player actions of angle;
The more new stage: treating each tree will be recalled the access times of movement income and movement after valuation more Newly.
Further, the prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two, (i.e. sampling rank Section estimates the selection situation of each movement, predicts opponent's tactics)
S31: being from player's angle first using priori knowledge, moving by the select probability of each movement and in this state Make probability product as molecule, and the probability that each state is occurred is calculated and selected under each player's particular state as denominator The probability of some movement;
S32: opponent selects most movements from opponent's angular observation prior actions, this kind of movement is known as being accustomed to dynamic Make, for this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit;
S33: determine that player should from the self benefits and priori knowledge of player angle combination player for some state The movement of selection;
The prediction of the decision and estimation are specific in the following way: set the prior probability that P (a) is appearance movement a, P (s | A) be in previous game bout appearance movement a shape probability of state, by the two product normalization after, choose this normalization after Prior probability and movement income U (si, a) maximum value combined, this maximum value combines priori knowledge and movement income, to object for appreciation This is more to trust this movement to a kind of trust value of some movement for family, then the probability for choosing this movement is bigger.Because The movement of player and opponent are correspondingFor corresponding relative motion, in gambling process Movement number be equal N (aj)=N (ai), so only needing to calculate the most common movement of opponent in preceding game bout , it is directly added into an adjusting parameter λ in habitual movement and movement income, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | it is the movement number in player actions set, tactful formula is dynamic to opponent's habit Make the mixing that the trust with player acts.
This method is solved for non-perfect synchronization problem of game and is more of practical significance, using more trees to different players It is modeled, remains mapping relations mutually between these trees, the sampling action information in maneuver library is screened fair play, kept away It is too big motion space is exempted from, caused game-tree search is difficult, and inefficiency solves the problems such as of low quality.In game theory The characteristics of mapping in search process between game theory ensure that the synchronization of state transfer, perfectly be bonded synchronous game.No Only in this way, carrying out search finding after different samplings in game theory, these solving results are compared and are screened, used Convergent Decision Method chooses the end value of all game theory solving results tendencies of final reaction, ensure that the accuracy and rationally of result Property, it will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, make the choosing of player in final gambling process Strategy is selected, the various actions of execution are more reasonable.The invention simultaneously is not player in two level estimated informations of opponent and player It is not single according to income selection movement, because both sides player a kind of will not choose dynamic according to income in practical gambling process Make, also habitual movement can be generated because of artificially to the hobby of certain movements, so needing to take into account the habitual movement of opponent It goes, under a certain state, the frequency that opponent acts if there is some is high, and representing this movement is largely preference movement Or habitual movement, income and habitual movement are taken into account simultaneously in decision, which greatly enhances the accuracys and spirit of strategy Activity.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is that non-perfect synchronizing information Monte Carlo tree searches for mutation technical solution figure;
Fig. 2 is the game knowledge extractive technique route map based on sampling;
Fig. 3 is non-perfect synchronizing information monte carlo search schematic diagram;
Fig. 4 is opponent's parsing action modeling figure under non-perfect information.
Specific embodiment
To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this Technical solution in inventive embodiments carries out clear and complete description:
Synchronous game monte carlo search sets mutation method, this method pair more under the non-complete information of one kind disclosed by the invention The information that the partial information of observable and player deliberately hide in non-perfect synchronizing information game, extracts, is added to game theory Middle auxiliary policy selection.It makes up non-perfect information and is known due to information and is not difficult to entirely to its direct solution.Reservation synchronizes rich simultaneously The characteristics of playing chess is transformed tree, makes the tree established with different player's angles, and search process is synchronous to be carried out, and specifically includes following Step:
Step 1-1: for the strategy of player, remaining information will be speculated according to known posterior infromation, diminution acts model It encloses, screens fair play, in order to reduce the search scale of game theory, then perfect information game opponent's strategy estimation mode shifted In the information for speculating and observing into non-perfect information game.Except search to each state under opponent habitual movement into Row record, establishment strategy auxiliary function.
Step 1-2: for the information observed and guessed in gambling process, opponent is done using error message in order to prevent Disturb, thus estimation opponent's information in hiding information need to do it is anti-interference, remove mistake information, retain real information.Side Method are as follows: all information are sampled before game theory expansion, screen fair play.Opponent is held in gambling process before Capable movement is recorded, and sets a threshold value according to actual needs, the movement income in this threshold value is screened, right Player and the higher movement of opponent's income are marked, and establish an action message library, store these information, for game theory pair The screening of fair play is prepared.
Step 1-3: after game-tree search, the search result of each game theory is trained again, is predicted final Dominating stragegy.The result of search is combined, by these from different player's angles and the result of the tree of different sampling actions into Row compares, and the end value of all game theory solving result tendencies of final reaction is chosen using convergent Decision Method, ensure that result Accuracy and reasonability, will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, makes final game The selection strategy of player in journey, the various actions of execution are more reasonable.
Step 1-4: the foundation of game theory is the improvement in traditional Monte Carlo tree, is arranged two according to different players It sets, is interknited between the game theory of different players, these set while being unfolded every wheel, and the sample content before expansion is identical 's.Each starting to spread out movement from player's angle is the information collection according to oneself, and the movement of opponent is directly set according to other Mapping comes, and is on-line search between player and the game theory of opponent, and communication with one another affects one another.The purpose for the arrangement is that In order to guarantee that the movement of different players executes synchronization, the state transfer after each execution movement is identical, is bonded the spy of synchronous game Point.
Step 2-1: sample phase, expansion first carries out grab sample to these movements from information bank every time, only sampling one Partial act, then game theory is unfolded.The movement type and quantity sampled before expansion every time are all random, but need to protect Demonstrate,proving the sample size sampled every time is identical, being consistent property.So multiple sample, these samples are put back to after sampling This, in order to set the use of information in the pre-deployed next time.Movement is sampled before this each game-tree search and is put back to again. It then samples when second of search, once analogizes, after recycling n times, because the information for sampling some every time is not sample , this is to retain certain flexibility, and the data that this part does not sample prevent the solution of search excessively mechanical and inflexible, retains one Partial information is that other states occurred in actual player gambling process are prepared.
Further, step 2-2: the choice phase completes according to the game theory of the angle of different players in sample phase Afterwards, according to the action message after screening, start to select.Each player is to concentrate selection dynamic from the information of the upper node of oneself tree Make, and for the movement of opponent is come from the result mapping after the selection in other trees.
Step 2-3: extension phase, the generation movement transfer after both sides' player actions are carried out, due to mutual between movement Mapping, and both sides act generation movement transfer after being carried out, so state each other is consistent, ensure that movement is same Step, because they successive are not influenced by executing before state does not change.
Step 2-4: the game of dummy run phase, respective player's angle are after generating state transfer, while to be simulated, Each tree only carries out valuation to this tree for the player actions of angle.
Step 2-5: the more new stage, treat each tree will by the access times of movement income and movement after valuation, Carry out backtracking update.For these processes, it is completed at the same time between tree.
Further, before game decision-making, pre-estimate opponent's status and predict opponent's action strategy, i.e., into Row Opponent Modeling.The prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two.
Step 3-1: being from player's angle, using priori knowledge, by the select probability of each movement and in the state first Under movement probability product as molecule, and the probability that each state is occurred calculates each player's particular state as denominator The probability of lower some movement of selection.Because can consider to act the dynamic of Income Maximum under oneself corresponding states for player's angle Make, can also consider which should be selected act in Heuristics under some state, if select probability is maximum in Heuristics The movement for acting while being also Income Maximum, probably selects the movement from player for the angle of player.
Step 3-2: from the point of view of opponent's angle, we can observe opponent in prior actions and select most movements, this kind of Movement is known as habitual movement, and for this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit, because with Corresponding movement number and the habitual movement number of opponent be equal, it is possible to the movement that execute is speculated according to quantity.
Step 3-3: be directed to some state, for player, in conjunction with player self benefits and priori knowledge in player answer The movement of the selection.But in practical gambling process, opponent is possible without acting according to income selection for machinery, there is also The habitual movement of oneself considers that habitual movement is to retain the variation of certain ratio reply movement for opponent.
Step 3-4: circular are as follows: P (a) is the prior probability of appearance movement a, and P (s | a) is previous game bout The shape probability of state of middle appearance movement a.After the normalization of the two product, prior probability and movement after choosing this normalization are received Beneficial U (si, a) combine maximum value, this maximum value combine priori knowledge and movement income, this is to some for player A kind of trust value of movement, more trusts this movement, then the probability for choosing this movement is bigger.Because of the movement of player and opponent It is correspondingFor corresponding relative motion, the movement number in gambling process is equal N(aj)=N (ai), so only need to calculate the most common movement of opponent in preceding game bout, it can be dynamic in habit Make and movement income be directly added into an adjusting parameter λ, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | it is the movement number in player actions set, tactful formula is dynamic to opponent's habit Make the mixing that the trust with player acts.
Existing synchronous problem of game is the searching method under perfect information mostly, but in practical game, more Synchronous game be it is non-perfect, this method is solved for non-perfect synchronization problem of game and is more of practical significance, and uses more Tree models different players, remains mapping relations mutually between these trees, the sampling action information in maneuver library, screening Fair play, it is too big to avoid motion space, and caused game-tree search is difficult, and inefficiency solves the problems such as of low quality. It ensure that the synchronization of state transfer in the mapping in the search process of game theory between game theory, be perfectly bonded synchronous game The characteristics of.Moreover, search finding is carried out after different samplings in game theory, these solving results are compared and Screening is chosen the end value of all game theory solving result tendencies of final reaction using convergent Decision Method, ensure that the standard of result True property and reasonability, will not be finally tactful because of the unilateral decision of the error of the solving result of one tree, makes final gambling process The various actions of the selection strategy of middle player, execution are more reasonable.The invention simultaneously is in two level estimation letters of opponent and player It is not single according to income selection movement that breath, which is not player, because both sides player will not a kind of basis in practical gambling process Income selection movement can also generate habitual movement because of artificially to the hobby of certain movements, so needing the habit of opponent Movement is taken into account, and under a certain state, the frequency that opponent acts if there is some is high, and representing this movement is largely It is preference movement or habitual movement, takes into account income and habitual movement simultaneously in decision, which greatly enhances strategies Accuracy and flexibility.
Embodiment:
Fig. 1 is overall technical solution figure;Non-perfect synchronizing information problem of game includes carrying out to the gambling process of player Modeling, carries out expansion solution with each player's angle, then on the basis of this, and non-perfect information game monte carlo search sets mutation more Possess sampling, selection, extension, simulation, update;And tree with tree search be it is inter-related, synchronization.It is right before search Key message extracts, and carries out pre-estimation to opponent.It is solved in the search for uncertain problem.
It is screening fair play first first, reduces motion space, the action message after screening is sampled.Such as Fig. 2, enter Search process, then the search result by the search mutation of multiple Monte Carlo trees is combined, and elects most representative ask Solution value.
The specific search process of game theory, such as Fig. 3 is shown, and player and opponent sample from movement information bank respectively, Motion space is reduced, so as to the expansion of game theory.Player and opponent select according to the information of sampling, it is assumed that player executes A1 is acted, meanwhile, opponent's execution acts b1, and after having executed respective movement respectively, tree 1 will execute two trees from tree 2 Movement b1 mapping come, similarly, a1 2 is mapped by tree from another one tree, and on-line search in this improves the effect of search Rate.After two trees have been carried out the movement of both sides, completion status transfer calculates income, remains synchronous feature.
Due in non-perfect information, the characteristics of loss of learning, need to estimate opponent, establishment strategy is promoted, and is retained Reflect the useful information of opponent's state rule, and opponent's behavior is estimated according to key message.As shown in figure 4, player's basis first The Heuristics of oneself considers decision scheme to history game state and the movement income of oneself, considers further that the angle of opponent, Observe opponent's information, thus it is speculated that Behavior preference opponent has.Two aspects combine, and estimate opponent, construct selection strategy formula.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (4)

1. synchronous game monte carlo search sets mutation method more under a kind of non-complete information characterized by comprising
S1: remaining information will be speculated according to known posterior infromation for the strategy of player, screen fair play, then perfection is believed Breath game opponent strategy estimation mode is transferred in the information in non-perfect information game speculating and observing, right except search The habitual movement of opponent records under each state, establishment strategy auxiliary function;
S2: all information are sampled before game theory expansion, screen fair play: by opponent in gambling process before The movement of execution is recorded, according to actual needs given threshold, the movement income in the threshold value is screened, to player High movement is marked with opponent's income, establishes an action message library and stores:
S3: after game-tree search, the search result of each game theory being trained again, predicts final dominating stragegy: will The result of search is combined, and the search result of these game theories of player and different sampling actions from different perspectives is compared Compared with the end value for using all game theory solving results of convergent Decision Method selection final reaction to be inclined to;
S4: being arranged two game theories according to different players, wherein be to interknit between the game theory of different players, every wheel pair Multiple game theories are unfolded simultaneously, and the sample content before expansion is identical, and each starting to spread out movement from player's angle is according to oneself Information collection, for opponent movement directly according to other set mapping obtain.
2. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 1 Also reside in: described to sample before game theory expansion to all information, screening fair play specifically includes: sample phase, choosing It is as follows to select stage, extension phase, dummy run phase and more new stage, specific embodiment:
Sample phase: expansion first carries out grab sample to these movements from information bank every time, only samples run, then right Game theory is unfolded;The movement type and quantity wherein sampled before expansion every time carry out at random, wherein the sample sampled every time Quantity is identical;
Choice phase: according to the game theory of the angle of different players, after the completion of sample phase, according to the action message after screening Start to select, each player be the information of upper node is set from oneself to concentrate selection movement, and for the movement of opponent be then from Result mapping after selection in other trees comes;
Extension phase: the generation movement transfer after both sides' player actions are carried out;
Dummy run phase: the game of respective player's angle is: after generating state transfer while simulating, each tree is only to this Tree be angle player actions carry out valuation;
The more new stage: backtracking update will be carried out for the access times of movement income and movement after valuation by treating each tree.
3. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 1 It also resides in: the prediction and estimation of decision will be carried out in terms of opponent's angle and self-view two:
S31: being from player's angle first using priori knowledge, movement by the select probability of each movement and in this state is general Rate product is as molecule, and the probability that each state is occurred calculates as denominator and selects some under each player's particular state The probability of movement;
S32: opponent selects most movements from opponent's angular observation prior actions, and this kind of movement is known as habitual movement, needle To this kind of movement, player, which should select to be corresponding to it, can obtain the movement of optimal benefit;
S33: determine that player should select from the self benefits and priori knowledge of player angle combination player for some state Movement.
4. synchronous game monte carlo search sets mutation method, feature more under non-complete information according to claim 3 It also resides in:
The prediction and estimation of the decision are specific in the following way:
If P (a) is the prior probability of appearance movement a, P (s | a) is the shape probability of state of appearance movement a in previous game bout, Prior probability and movement income U (s after the normalization of the two product, after choosing this normalizationi, a) maximum value combined, such as Income U (the s of some movement of fruiti, a) the maximum probability for then choosing the movement is bigger;Calculate in preceding game bout opponent most The movement often occurred is directly added into an adjusting parameter λ in habitual movement and movement income, i.e., tactful formula is as follows:
N(ai)=N (aj)
Wherein y is mixed strategy, | A (I) | be the movement number in player actions set, tactful formula be to opponent's habitual movement and The mixing of the trust movement of player.
CN201910860992.8A 2019-09-11 2019-09-11 Synchronous game monte carlo search sets mutation method more under non-complete information Pending CN110489668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860992.8A CN110489668A (en) 2019-09-11 2019-09-11 Synchronous game monte carlo search sets mutation method more under non-complete information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860992.8A CN110489668A (en) 2019-09-11 2019-09-11 Synchronous game monte carlo search sets mutation method more under non-complete information

Publications (1)

Publication Number Publication Date
CN110489668A true CN110489668A (en) 2019-11-22

Family

ID=68557628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860992.8A Pending CN110489668A (en) 2019-09-11 2019-09-11 Synchronous game monte carlo search sets mutation method more under non-complete information

Country Status (1)

Country Link
CN (1) CN110489668A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111905373A (en) * 2020-07-23 2020-11-10 深圳艾文哲思科技有限公司 Artificial intelligence decision method and system based on game theory and Nash equilibrium
CN112463992A (en) * 2021-02-04 2021-03-09 中至江西智能技术有限公司 Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field
CN116039957A (en) * 2022-12-30 2023-05-02 哈尔滨工业大学 Spacecraft online game planning method, device and medium considering barrier constraint
CN116039956A (en) * 2022-11-02 2023-05-02 哈尔滨工业大学 Spacecraft sequence game method, device and medium based on Monte Carlo tree search

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111905373A (en) * 2020-07-23 2020-11-10 深圳艾文哲思科技有限公司 Artificial intelligence decision method and system based on game theory and Nash equilibrium
CN112463992A (en) * 2021-02-04 2021-03-09 中至江西智能技术有限公司 Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field
CN112463992B (en) * 2021-02-04 2021-06-11 中至江西智能技术有限公司 Decision-making auxiliary automatic question-answering method and system based on knowledge graph in mahjong field
CN116039956A (en) * 2022-11-02 2023-05-02 哈尔滨工业大学 Spacecraft sequence game method, device and medium based on Monte Carlo tree search
CN116039956B (en) * 2022-11-02 2023-11-14 哈尔滨工业大学 Spacecraft sequence game method, device and medium based on Monte Carlo tree search
CN116039957A (en) * 2022-12-30 2023-05-02 哈尔滨工业大学 Spacecraft online game planning method, device and medium considering barrier constraint
CN116039957B (en) * 2022-12-30 2024-01-30 哈尔滨工业大学 Spacecraft online game planning method, device and medium considering barrier constraint

Similar Documents

Publication Publication Date Title
CN110489668A (en) Synchronous game monte carlo search sets mutation method more under non-complete information
Ye et al. Towards playing full moba games with deep reinforcement learning
Winands et al. Monte Carlo tree search in lines of action
Samothrakis et al. Fast approximate max-n monte carlo tree search for ms pac-man
CN108985458A (en) A kind of double tree monte carlo search algorithms of sequential synchronous game
Gaina et al. Population seeding techniques for rolling horizon evolution in general video game playing
Ramanujan et al. Understanding sampling style adversarial search methods
Schauenberg Opponent modelling and search in poker
Baier et al. Guiding multiplayer MCTS by focusing on yourself
CN109002893A (en) A kind of sequential synchronous sequence monte carlo search algorithm
Zhang et al. AlphaZero
Heinrich et al. Self-play Monte-Carlo tree search in computer poker
Barthet et al. Go-blend behavior and affect
CN110727870A (en) Novel single-tree Monte Carlo search method for sequential synchronous game
Fu Markov decision processes, AlphaGo, and Monte Carlo tree search: Back to the future
Dobre et al. Online learning and mining human play in complex games
Szczepański et al. Case-based reasoning for improved micromanagement in Real-time strategy games.
Schadd et al. Addressing NP-complete puzzles with Monte-Carlo methods
Maes et al. Monte carlo search algorithm discovery for single-player games
Schaeffer et al. Learning to play strong poker
Leece et al. Sequential pattern mining in Starcraft: Brood War for short and long-term goals
Dobre et al. Exploiting action categories in learning complex games
Liu et al. An improved minimax-Q algorithm based on generalized policy iteration to solve a Chaser-Invader game
Gaina et al. Project Thyia: A forever gameplayer
Ameneyro et al. Playing carcassonne with monte carlo tree search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191122