CN106296006A - The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation - Google Patents
The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation Download PDFInfo
- Publication number
- CN106296006A CN106296006A CN201610658485.2A CN201610658485A CN106296006A CN 106296006 A CN106296006 A CN 106296006A CN 201610658485 A CN201610658485 A CN 201610658485A CN 106296006 A CN106296006 A CN 106296006A
- Authority
- CN
- China
- Prior art keywords
- strategy
- risk
- sorry
- sigma
- game
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides the minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation, comprise the steps: step 1: for each information collection, initialize its strategy, valuation and the sorry value of each action;Step 2: use current strategy to carry out game, until completing this game;Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;Step 4: calculate the strategy made new advances according to sorry matching algorithm;Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting strategy to be used in next round game;Step 6: return step 2, until gambling process terminates.The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account the risk of strategy, reach the most rational Nash Equilibrium.
Description
Technical field
The present invention relates to artificial intelligence field, particularly relate to the minimum something lost of non-perfect information game risk and Revenue Reconciliation
The appraisal procedure of regret.
Background technology
Artificial intelligence is an important branch of computer realm, and its central task is to study how to make computer do
Originally the work that the intelligence of people just can complete can only be leaned on.Game playing by machine, as an important research field of artificial intelligence, is inspection
Test an important means of Artificial Intelligence Development level.In the research of game playing by machine, non-complete information machine game is this neck
One of the difficult point of territory research and emphasis.Game side in non-perfect information game is owing to cannot obtain all of information, thus nothing
Method accurately predicts which countermeasure opponent can take.This is similar with the situation of commercial competition, military war etc. in society, it
Research has the strongest reference value for setting up the DSS of society.
Summary of the invention
In order to solve problem in prior art, the invention provides non-perfect information game risk and Revenue Reconciliation
Few sorry appraisal procedure, comprises the steps:
Step 1: for each information collection, initializes its strategy, valuation and the sorry value of each action;
Step 2: use current strategy to carry out game, until completing this game;
Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;
Step 4: calculate the strategy made new advances according to sorry matching algorithm;
Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and want
The strategy used;
Step 6: return step 2, until gambling process terminates.
The invention has the beneficial effects as follows:
The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum
Sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account
The risk of strategy, reaches the most rational Nash Equilibrium.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention;
Fig. 2 is non-perfect information game process;
Fig. 3 is I, II type risk of loss schematic diagram in risk model.
Detailed description of the invention
The present invention will be further described below in conjunction with the accompanying drawings.
First the model of non-perfect information game and the basic conception of risk model are introduced.
Non-complete Information expansion formula game is a hexa-atomic group of < H, H, P, fc,{Li}I=1,2 ..., N,{ui}I=1,2 ..., N>
Wherein N is the finite aggregate representing player;H is the set of limited action sequence, empty sequenceAnd the prefix of each action sequence is also
Element in H.Terminator sequence Z is not to be the sequence of any sequence prefix in H.For nonterminal sequences h ∈ H, A (h)={ a:ha
∈ H} represents the set of the action that can perform after action sequence h.Function P is that nonterminal sequence distributes a player, its
Middle c represents random event.P (h) represents which player to do action at sequences h trailing wheel to.If P (h)=c, then random event is certainly
Action after fixed sequence h.For player i ∈ N,Represent that its information is split;Information segmentation
Element is referred to as information collection, and each information collection is the subset of H, represents some action sequences that cannot clearly distinguish.Function fcFor P
(I) the information collection of=c provides the probability that in A (h), each action a occurs, and is expressed as fc(a|I);For player i ∈ N, ui:Z
→ R is its utility function, obtains return value in each terminator sequence.
The tactful σ of player iiIt is to each information collection Ii∈Li,σi(Ii):A(Ii) → [0,1] it is at behavior aggregate A (Ii)
Probability-distribution function.The policy space ∑ of player iiRepresent.One the tactful group strategy comprising all players, with σ=(σ1,
σ2,...,σN) represent.Use σ-iRepresent and remove player i, tactful group of remaining all player's strategies composition.
Given strategy group σ (when all players are according to strategy σ selection action), the probability that definition action sequence h occurs is πσ
(h).Obviously πσH () can be decomposed into the product that the generation of action sequence h is contributed by each player, i.e.In like manner, definableFor two different action sequence h and h',
Make πσ(h, h') is under strategy group σ, the transition probability from h to h', if h is the prefix of h', then πσ(h, h')=πσ(h)/πσ
(h') otherwise, πσ(h, h')=0.It is similar to, can defineWith
Set W in Fig. 2 represents the set of all possible situation of non-perfect information game environment I, each in W
Element wiAll representing a possible complete information state of I, the time of day of I is some w in Wi.Here generation is introduced
The concept on boundary a: world is a possible state of non-perfect information game.W is world's collection of current game state, and S is W
Sampling collection,The basic process of complete information Monte Carlo sampling approach is, uses random method to sample out the son of W
Collection S, to each complete information world s thereiniCalculate, each s of statistical analysisiOptimal solution mi, finally select in M
Final optimal strategy sequence.
Uncertainty in game playing by machine problem policy selection algorithm is attributed to two categories below risk of loss.
I type risk of loss and computational methods thereof:
The risk of loss caused by the inaccuracy to world's valuation of evaluation function is referred to as I type risk of loss.Assume generation
The optimal strategy sequence of boundary w is m, and the most now the I type risk of loss computational methods of m are as follows:
In above formula,Represent evaluation function to taking the income valuation of policy sequence m under world w,Represent true
The world takes income valuation during policy sequence m.
II type risk of loss and computational methods:
The risk of loss caused due to the inaccuracy of opponent's optimal strategy judgement is referred to as II type risk of loss, policy sequence
The II type risk of loss computational methods of m are as follows:
It it is evaluation function real world I is taked policy sequence m income valuation.Game both sides under real world I
Practical strategies sequence m ' income valuation.
Fig. 3 illustrates the difference of I, II type risk of loss, evaluation function to world w and real world I through policy sequence m
The valuation difference of prospective earnings be I type risk of loss, figure is LwI, in real world I, policy sequence m and practical strategies sequence
The prospective earnings difference of m ' is II type risk of loss, is L in figuremII.Thus, the risk of policy sequence m is used to damage under definition world w
Mistake is
Lwm=LwI+LmII (3)。
The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum
Sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account
The risk of strategy, reaches the most rational Nash Equilibrium.
Each step below in conjunction with Fig. 1 just invention elaborates.Basic step is:
Step 1: initialize.For player i ∈ N, to each of which information collection I ∈ LiThe valuation v (I, σ) of upper strategy=
0 couple of each a ∈ A (I), r (I, a)=0, its strategy is initialized as δi(I, a)=1/ | A (I) |
Step 2: game side carries out action in turn according to the strategy of oneself, until this game terminates, and records each game
Reef knot fruit.
Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;
The value of information JiIChu:
At information collection I, do not take the sorry value of action a:
Step 4: the valuation on each information collection having access to obtained by previous step is according to regretting matching algorithm, again
For each action partition density on each information collection, obtain new strategy.So calculate compared to directly taking to regret degree
Maximum action, is advantageous in that the calculating avoiding opponent to carry out regret value equally, the strategy of perception one's own side.Thus obtain with income
Preferential strategy.
For information collection I, obtain, by sorry coupling, the strategy that next step a income is preferential:
Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and want
The strategy used.
Risk factor impact on payoff be considered below:
For the feature of non-complete information machine game, the method proposing an approximation calculation risk loss, it is basic
Thought is the average calculating the estimated revenue in sampling collection S, replaces the true earning of I in world collection W.
Assuming that the world of current state is integrated as W by game person, unit's prime number is that the sampling of n, W integrates as S, unit's prime number be t, M be W
All legal policy arrangement sets, unit prime number be k.First average yield computational methods now are given:
Definition:Average yield for sampling collection S.Computational methods are as follows:
Based on (7) formula, the integrated risk loss approximation computational methods formula for policy sequence δ is as follows:
(8), in formula, about equal sign institute junction is useAnd sampling collection S carries out the process of approximate calculation.
Based on above method, it is possible to calculate the value-at-risk of New Policy.
Followed by the relation how considered between income and risk.
Assume have tactful A, B.EAAnd EBRepresent game person's prospective earnings for strategy A, B respectively.LAAnd LBRepresent strategy
The risk of loss of A and B.Then the good and bad judgment rule of strategy A, B is as follows:
1: if strategy A, B meet uA-LA>uB, then A is better than B, otherwise, if meeting uB-LB>uA, then B is better than A.
2: otherwise, by following formula:
If R>0, then A is better than B, if R<0, then B is better than A, if R=0, then AB etc. are excellent, and system can randomly choose.
By above method, can be ranked up the new and old strategy of current game person, the strategy of sequence optimum is as current
Risk and the strategy of Revenue Reconciliation, that is to say the optimal strategy of game person.
Step 6: judge whether whole gambling process terminates, if not terminating, returning step 2 and continuing executing with.
Above content is to combine concrete preferred implementation further description made for the present invention, it is impossible to assert
Being embodied as of the present invention is confined to these explanations.For general technical staff of the technical field of the invention,
On the premise of present inventive concept, it is also possible to make some simple deduction or replace, all should be considered as belonging to the present invention's
Protection domain.
Claims (5)
- The minimum sorry appraisal procedure of the most non-perfect information game risk and Revenue Reconciliation, it is characterised in that:Comprise the steps:Step 1: for each information collection, initializes its strategy, valuation and the sorry value of each action;Step 2: use current strategy to carry out game, until completing this game;Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;Step 4: calculate the strategy made new advances according to sorry matching algorithm;Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and to use Strategy;Step 6: return step 2, until gambling process terminates.
- The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 1 and Revenue Reconciliation, It is characterized in that: in step 1, initialization procedure is as follows: for player i ∈ N, to each of which information collection I ∈ LiEstimating of upper strategy Value v (I, σ)=0, to sorry value r on information collection I of each a ∈ A (I), action a (I, a)=0, its strategy is initialized as δi(I, a)=1/ | A (I) |, when representing initial, the probability of each action is equal, adds up to 1, wherein: N is represent player limited Collection, LiRepresenting the information segmentation of player i, I is information collection, and σ is strategy group, and a is action.
- The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 2 and Revenue Reconciliation, It is characterized in that: in step 3, the value of information JiIChu:At information collection I, do not take the sorry value of action a:Wherein, z represents that in terminator sequence set, u (z) represent the actual utility value after arriving game final state, z [I] represents terminator sequence z display part on information collection I,Represent that all of opponent of player i arrives the general of z [I] Rate, πσ(z [I], z) is all players transition probability from historical series z [I] to z,Represent one and σtStrategy of equal value Group, except in information collection I, strategy groupAlways selection action a formula (5) calculating player i in T wheel iteration takes to move Make the average sorry value of a.
- The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 3 and Revenue Reconciliation, It is characterized in that: in step 4, previous step the valuation on each information collection having access to obtained is calculated according to regretting coupling Method, is each action partition density on each information collection again, obtains new strategy, thus obtain the plan preferential with income Slightly, for information collection I, obtain, by sorry coupling, the strategy that next step a income is preferential:Wherein, formula implication is: when cumulative Sorry value be timing, be normalized than upper total sorry value, proportional more New Policy, otherwise the iterative strategy of next round is i.e. For initial homogenization strategy, wherein R represents the sorry value that cumulative T takes turns, and a represents action, I representative information collection,I.e. For next round (T+1 wheel) at information collection I, the probability of player i employing action a.
- The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 4 and Revenue Reconciliation, It is characterized in that: in step 5, for the feature of non-complete information machine game, propose an approximation calculation risk loss Method, its basic thought is the average calculating the estimated revenue in sampling collection S, replaces the true earning of I in world collection W;Assuming that the world of current state is integrated as W by game person, unit's prime number is that the sampling of n, W integrates as S, unit's prime number be t, M be the institute of W Having a legal policy arrangement set, unit's prime number is k, first provides average yield computational methods now:Definition:For the average yield of sampling collection S, computational methods are as follows:Based on (7) formula, the integrated risk loss approximation computational methods formula for policy sequence δ is as follows:(8), in formula, about equal sign institute junction is useAnd sampling collection S carries out the process of approximate calculation, based on top Method, calculates the value-at-risk of New Policy;Followed by the relation how considered between income and risk,Assume have tactful A, B, EAAnd EBRepresent game person's prospective earnings for strategy A, B, L respectivelyAAnd LBRepresent strategy A and B Risk of loss, then strategy A, B good and bad judgment rule as follows:1: if strategy A, B meet uA-LA>uB, then A is better than B, otherwise, if meeting uB-LB>uA, then B is better than A;2: otherwise, by following formula:If R>0, then A is better than B, if R<0, then B is better than A, if R=0, then AB etc. are excellent, and system can randomly choose;By above method, being ranked up the new and old strategy of current game person, the strategy of sequence optimum accounts for as current risk Dominant strategy, that is to say the optimal strategy of game person, and wherein, R represents risk, LAAnd LBRepresent the risk of loss of strategy A and B, uAWith uBRepresent the actual benefit value of strategy A and B.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610658485.2A CN106296006A (en) | 2016-08-10 | 2016-08-10 | The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610658485.2A CN106296006A (en) | 2016-08-10 | 2016-08-10 | The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106296006A true CN106296006A (en) | 2017-01-04 |
Family
ID=57668611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610658485.2A Pending CN106296006A (en) | 2016-08-10 | 2016-08-10 | The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106296006A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109829566A (en) * | 2018-12-26 | 2019-05-31 | 中国人民解放军国防科技大学 | Method for generating combat action sequence |
CN110404265A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | A kind of non-complete information machine game method of more people based on game final phase of a chess game online resolution, device, system and storage medium |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110599051A (en) * | 2019-09-19 | 2019-12-20 | 桂林电子科技大学 | Sub-game perfect Nash balanced fetching method of two agents |
CN110772798A (en) * | 2019-10-23 | 2020-02-11 | 桂林电子科技大学 | Method for searching Nash equilibrium sequence based on FIP structure |
KR102133143B1 (en) * | 2019-01-17 | 2020-07-13 | 알리바바 그룹 홀딩 리미티드 | Strategic search in strategic interaction between parties |
CN111905373A (en) * | 2020-07-23 | 2020-11-10 | 深圳艾文哲思科技有限公司 | Artificial intelligence decision method and system based on game theory and Nash equilibrium |
WO2020227960A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of an execution device |
WO2020227958A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of execution device |
WO2020227954A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of an execution device |
CN112041811A (en) * | 2019-12-12 | 2020-12-04 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
CN112041875A (en) * | 2019-12-12 | 2020-12-04 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
CN112149824A (en) * | 2020-09-15 | 2020-12-29 | 支付宝(杭州)信息技术有限公司 | Method and device for updating recommendation model by game theory |
CN112639841A (en) * | 2019-01-17 | 2021-04-09 | 创新先进技术有限公司 | Sampling scheme for policy search in multi-party policy interaction |
US11144841B2 (en) | 2019-12-12 | 2021-10-12 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
US11157316B1 (en) | 2020-04-02 | 2021-10-26 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
US11204803B2 (en) | 2020-04-02 | 2021-12-21 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
CN114580642A (en) * | 2022-03-17 | 2022-06-03 | 中国科学院自动化研究所 | Method, device, equipment and medium for constructing game AI model and processing data |
-
2016
- 2016-08-10 CN CN201610658485.2A patent/CN106296006A/en active Pending
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109829566A (en) * | 2018-12-26 | 2019-05-31 | 中国人民解放军国防科技大学 | Method for generating combat action sequence |
CN112639841B (en) * | 2019-01-17 | 2024-02-06 | 创新先进技术有限公司 | Sampling scheme for policy searching in multiparty policy interactions |
RU2743626C1 (en) * | 2019-01-17 | 2021-02-20 | Эдванст Нью Текнолоджиз Ко., Лтд. | Strategy search in strategic interaction between parties |
CN112639841A (en) * | 2019-01-17 | 2021-04-09 | 创新先进技术有限公司 | Sampling scheme for policy search in multi-party policy interaction |
CN112292701A (en) * | 2019-01-17 | 2021-01-29 | 创新先进技术有限公司 | Conducting policy search in multi-party policy interaction |
KR102133143B1 (en) * | 2019-01-17 | 2020-07-13 | 알리바바 그룹 홀딩 리미티드 | Strategic search in strategic interaction between parties |
WO2020147075A1 (en) * | 2019-01-17 | 2020-07-23 | Alibaba Group Holding Limited | Strategy searching in strategic interaction between parties |
CN112292699A (en) * | 2019-05-15 | 2021-01-29 | 创新先进技术有限公司 | Determining action selection guidelines for an execution device |
WO2020227960A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of an execution device |
WO2020227958A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of execution device |
WO2020227954A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced New Technologies Co., Ltd. | Determining action selection policies of an execution device |
CN112292698A (en) * | 2019-05-15 | 2021-01-29 | 创新先进技术有限公司 | Determining action selection guidelines for an execution device |
CN112292696A (en) * | 2019-05-15 | 2021-01-29 | 创新先进技术有限公司 | Determining action selection guidelines for an execution device |
CN110404264B (en) * | 2019-07-25 | 2022-11-01 | 哈尔滨工业大学(深圳) | Multi-person non-complete information game strategy solving method, device and system based on virtual self-game and storage medium |
CN110404265B (en) * | 2019-07-25 | 2022-11-01 | 哈尔滨工业大学(深圳) | Multi-user non-complete information machine game method, device and system based on game incomplete on-line resolving and storage medium |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110404265A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | A kind of non-complete information machine game method of more people based on game final phase of a chess game online resolution, device, system and storage medium |
CN110599051A (en) * | 2019-09-19 | 2019-12-20 | 桂林电子科技大学 | Sub-game perfect Nash balanced fetching method of two agents |
CN110772798A (en) * | 2019-10-23 | 2020-02-11 | 桂林电子科技大学 | Method for searching Nash equilibrium sequence based on FIP structure |
US11113619B2 (en) | 2019-12-12 | 2021-09-07 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
US11077368B2 (en) | 2019-12-12 | 2021-08-03 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
US11144841B2 (en) | 2019-12-12 | 2021-10-12 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
CN112041875B (en) * | 2019-12-12 | 2022-04-22 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
CN112041811B (en) * | 2019-12-12 | 2022-09-16 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
CN112041875A (en) * | 2019-12-12 | 2020-12-04 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
CN112041811A (en) * | 2019-12-12 | 2020-12-04 | 支付宝(杭州)信息技术有限公司 | Determining action selection guidelines for an execution device |
US11157316B1 (en) | 2020-04-02 | 2021-10-26 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
US11204803B2 (en) | 2020-04-02 | 2021-12-21 | Alipay (Hangzhou) Information Technology Co., Ltd. | Determining action selection policies of an execution device |
CN111905373A (en) * | 2020-07-23 | 2020-11-10 | 深圳艾文哲思科技有限公司 | Artificial intelligence decision method and system based on game theory and Nash equilibrium |
CN112149824A (en) * | 2020-09-15 | 2020-12-29 | 支付宝(杭州)信息技术有限公司 | Method and device for updating recommendation model by game theory |
CN114580642A (en) * | 2022-03-17 | 2022-06-03 | 中国科学院自动化研究所 | Method, device, equipment and medium for constructing game AI model and processing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106296006A (en) | The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation | |
Glickman | Dynamic paired comparison models with stochastic variances | |
CN106339582B (en) | A kind of chess and card games automation final phase of a chess game generation method based on game playing by machine technology | |
CN103942461B (en) | Water quality parameter Forecasting Methodology based on online online-sequential extreme learning machine | |
Cho et al. | Using social network analysis and gradient boosting to develop a soccer win–lose prediction model | |
Wheatcroft | A profitable model for predicting the over/under market in football | |
CN106169063A (en) | A kind of method in automatic identification user's reading interest district | |
Karlis et al. | On modelling soccer data | |
Spanias et al. | Predicting the outcomes of tennis matches using a low-level point model | |
Baker et al. | Optimal betting under parameter uncertainty: Improving the Kelly criterion | |
Dobson et al. | Persistence in sequences of football match results: A Monte Carlo analysis | |
Caliwag et al. | Predicting basketball results using cascading algorithm | |
Stekler et al. | Predicting the outcomes of NCAA basketball championship games | |
Ye | South Korea's free trade strategy and East Asian regionalism: A multistage approach | |
Sarkar et al. | An online system for player-vs-level matchmaking in human computation games | |
Albert | Streakiness in team performance | |
Everson et al. | Composite Poisson models for goal scoring | |
Akhtar et al. | Rating batters in test cricket | |
Goushehgir et al. | Developing a set of key performance indicators for monitoring sustainability of forest functions in the Hyrcanian forests | |
Omidiran | Penalized regression models for the NBA | |
Nishino et al. | Parallel monte carlo search for imperfect information game daihinmin | |
CN113658681A (en) | Decision tree-based drug abstinence personnel abstinence effect evaluation method | |
Klein-Soetebier et al. | Match analysis in table tennis | |
Peltola | Forecasting English Premier League season outcomes using expected goals-based Monte Carlo simulation | |
Traneva | " Prof. Asen Zlatarov” University," Prof. Yakimov" Blvd, Bourgas 8000, Bulgaria veleka13@ gmail. com, tranev@ abv. bg |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |