CN106296006A - The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation - Google Patents

The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation Download PDF

Info

Publication number
CN106296006A
CN106296006A CN201610658485.2A CN201610658485A CN106296006A CN 106296006 A CN106296006 A CN 106296006A CN 201610658485 A CN201610658485 A CN 201610658485A CN 106296006 A CN106296006 A CN 106296006A
Authority
CN
China
Prior art keywords
strategy
risk
sorry
sigma
game
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610658485.2A
Other languages
Chinese (zh)
Inventor
王轩
蒋琳
张加佳
滕雯娟
代佳宁
王鹏程
胡开亮
林云川
朱航宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201610658485.2A priority Critical patent/CN106296006A/en
Publication of CN106296006A publication Critical patent/CN106296006A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides the minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation, comprise the steps: step 1: for each information collection, initialize its strategy, valuation and the sorry value of each action;Step 2: use current strategy to carry out game, until completing this game;Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;Step 4: calculate the strategy made new advances according to sorry matching algorithm;Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting strategy to be used in next round game;Step 6: return step 2, until gambling process terminates.The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account the risk of strategy, reach the most rational Nash Equilibrium.

Description

The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation
Technical field
The present invention relates to artificial intelligence field, particularly relate to the minimum something lost of non-perfect information game risk and Revenue Reconciliation The appraisal procedure of regret.
Background technology
Artificial intelligence is an important branch of computer realm, and its central task is to study how to make computer do Originally the work that the intelligence of people just can complete can only be leaned on.Game playing by machine, as an important research field of artificial intelligence, is inspection Test an important means of Artificial Intelligence Development level.In the research of game playing by machine, non-complete information machine game is this neck One of the difficult point of territory research and emphasis.Game side in non-perfect information game is owing to cannot obtain all of information, thus nothing Method accurately predicts which countermeasure opponent can take.This is similar with the situation of commercial competition, military war etc. in society, it Research has the strongest reference value for setting up the DSS of society.
Summary of the invention
In order to solve problem in prior art, the invention provides non-perfect information game risk and Revenue Reconciliation Few sorry appraisal procedure, comprises the steps:
Step 1: for each information collection, initializes its strategy, valuation and the sorry value of each action;
Step 2: use current strategy to carry out game, until completing this game;
Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;
Step 4: calculate the strategy made new advances according to sorry matching algorithm;
Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and want The strategy used;
Step 6: return step 2, until gambling process terminates.
The invention has the beneficial effects as follows:
The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum Sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account The risk of strategy, reaches the most rational Nash Equilibrium.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention;
Fig. 2 is non-perfect information game process;
Fig. 3 is I, II type risk of loss schematic diagram in risk model.
Detailed description of the invention
The present invention will be further described below in conjunction with the accompanying drawings.
First the model of non-perfect information game and the basic conception of risk model are introduced.
Non-complete Information expansion formula game is a hexa-atomic group of < H, H, P, fc,{Li}I=1,2 ..., N,{ui}I=1,2 ..., N> Wherein N is the finite aggregate representing player;H is the set of limited action sequence, empty sequenceAnd the prefix of each action sequence is also Element in H.Terminator sequence Z is not to be the sequence of any sequence prefix in H.For nonterminal sequences h ∈ H, A (h)={ a:ha ∈ H} represents the set of the action that can perform after action sequence h.Function P is that nonterminal sequence distributes a player, its Middle c represents random event.P (h) represents which player to do action at sequences h trailing wheel to.If P (h)=c, then random event is certainly Action after fixed sequence h.For player i ∈ N,Represent that its information is split;Information segmentation Element is referred to as information collection, and each information collection is the subset of H, represents some action sequences that cannot clearly distinguish.Function fcFor P (I) the information collection of=c provides the probability that in A (h), each action a occurs, and is expressed as fc(a|I);For player i ∈ N, ui:Z → R is its utility function, obtains return value in each terminator sequence.
The tactful σ of player iiIt is to each information collection Ii∈Lii(Ii):A(Ii) → [0,1] it is at behavior aggregate A (Ii) Probability-distribution function.The policy space ∑ of player iiRepresent.One the tactful group strategy comprising all players, with σ=(σ1, σ2,...,σN) represent.Use σ-iRepresent and remove player i, tactful group of remaining all player's strategies composition.
Given strategy group σ (when all players are according to strategy σ selection action), the probability that definition action sequence h occurs is πσ (h).Obviously πσH () can be decomposed into the product that the generation of action sequence h is contributed by each player, i.e.In like manner, definableFor two different action sequence h and h', Make πσ(h, h') is under strategy group σ, the transition probability from h to h', if h is the prefix of h', then πσ(h, h')=πσ(h)/πσ (h') otherwise, πσ(h, h')=0.It is similar to, can defineWith
Set W in Fig. 2 represents the set of all possible situation of non-perfect information game environment I, each in W Element wiAll representing a possible complete information state of I, the time of day of I is some w in Wi.Here generation is introduced The concept on boundary a: world is a possible state of non-perfect information game.W is world's collection of current game state, and S is W Sampling collection,The basic process of complete information Monte Carlo sampling approach is, uses random method to sample out the son of W Collection S, to each complete information world s thereiniCalculate, each s of statistical analysisiOptimal solution mi, finally select in M Final optimal strategy sequence.
Uncertainty in game playing by machine problem policy selection algorithm is attributed to two categories below risk of loss.
I type risk of loss and computational methods thereof:
The risk of loss caused by the inaccuracy to world's valuation of evaluation function is referred to as I type risk of loss.Assume generation The optimal strategy sequence of boundary w is m, and the most now the I type risk of loss computational methods of m are as follows:
In above formula,Represent evaluation function to taking the income valuation of policy sequence m under world w,Represent true The world takes income valuation during policy sequence m.
II type risk of loss and computational methods:
The risk of loss caused due to the inaccuracy of opponent's optimal strategy judgement is referred to as II type risk of loss, policy sequence The II type risk of loss computational methods of m are as follows:
It it is evaluation function real world I is taked policy sequence m income valuation.Game both sides under real world I Practical strategies sequence m ' income valuation.
Fig. 3 illustrates the difference of I, II type risk of loss, evaluation function to world w and real world I through policy sequence m The valuation difference of prospective earnings be I type risk of loss, figure is LwI, in real world I, policy sequence m and practical strategies sequence The prospective earnings difference of m ' is II type risk of loss, is L in figuremII.Thus, the risk of policy sequence m is used to damage under definition world w Mistake is
Lwm=LwI+LmII (3)。
The present invention devises a kind of concept utilizing economics risk, and the principle of research risk model, in conjunction with minimum Sorry algorithm, applies in non-complete information machine game.While utilizing minimum sorry algorithm income dominant strategy, take into account The risk of strategy, reaches the most rational Nash Equilibrium.
Each step below in conjunction with Fig. 1 just invention elaborates.Basic step is:
Step 1: initialize.For player i ∈ N, to each of which information collection I ∈ LiThe valuation v (I, σ) of upper strategy= 0 couple of each a ∈ A (I), r (I, a)=0, its strategy is initialized as δi(I, a)=1/ | A (I) |
Step 2: game side carries out action in turn according to the strategy of oneself, until this game terminates, and records each game Reef knot fruit.
Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;
The value of information JiIChu:
At information collection I, do not take the sorry value of action a:
Step 4: the valuation on each information collection having access to obtained by previous step is according to regretting matching algorithm, again For each action partition density on each information collection, obtain new strategy.So calculate compared to directly taking to regret degree Maximum action, is advantageous in that the calculating avoiding opponent to carry out regret value equally, the strategy of perception one's own side.Thus obtain with income Preferential strategy.
For information collection I, obtain, by sorry coupling, the strategy that next step a income is preferential:
Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and want The strategy used.
Risk factor impact on payoff be considered below:
For the feature of non-complete information machine game, the method proposing an approximation calculation risk loss, it is basic Thought is the average calculating the estimated revenue in sampling collection S, replaces the true earning of I in world collection W.
Assuming that the world of current state is integrated as W by game person, unit's prime number is that the sampling of n, W integrates as S, unit's prime number be t, M be W All legal policy arrangement sets, unit prime number be k.First average yield computational methods now are given:
Definition:Average yield for sampling collection S.Computational methods are as follows:
Based on (7) formula, the integrated risk loss approximation computational methods formula for policy sequence δ is as follows:
(8), in formula, about equal sign institute junction is useAnd sampling collection S carries out the process of approximate calculation.
Based on above method, it is possible to calculate the value-at-risk of New Policy.
Followed by the relation how considered between income and risk.
Assume have tactful A, B.EAAnd EBRepresent game person's prospective earnings for strategy A, B respectively.LAAnd LBRepresent strategy The risk of loss of A and B.Then the good and bad judgment rule of strategy A, B is as follows:
1: if strategy A, B meet uA-LA>uB, then A is better than B, otherwise, if meeting uB-LB>uA, then B is better than A.
2: otherwise, by following formula:
If R>0, then A is better than B, if R<0, then B is better than A, if R=0, then AB etc. are excellent, and system can randomly choose.
By above method, can be ranked up the new and old strategy of current game person, the strategy of sequence optimum is as current Risk and the strategy of Revenue Reconciliation, that is to say the optimal strategy of game person.
Step 6: judge whether whole gambling process terminates, if not terminating, returning step 2 and continuing executing with.
Above content is to combine concrete preferred implementation further description made for the present invention, it is impossible to assert Being embodied as of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, On the premise of present inventive concept, it is also possible to make some simple deduction or replace, all should be considered as belonging to the present invention's Protection domain.

Claims (5)

  1. The minimum sorry appraisal procedure of the most non-perfect information game risk and Revenue Reconciliation, it is characterised in that:
    Comprise the steps:
    Step 1: for each information collection, initializes its strategy, valuation and the sorry value of each action;
    Step 2: use current strategy to carry out game, until completing this game;
    Step 3: calculate valuation and the sorry value of each action on each information collection that this game is had access to;
    Step 4: calculate the strategy made new advances according to sorry matching algorithm;
    Step 5: calculate the value-at-risk of New Policy and consider the relation of income and risk, selecting in next round game and to use Strategy;
    Step 6: return step 2, until gambling process terminates.
  2. The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 1 and Revenue Reconciliation, It is characterized in that: in step 1, initialization procedure is as follows: for player i ∈ N, to each of which information collection I ∈ LiEstimating of upper strategy Value v (I, σ)=0, to sorry value r on information collection I of each a ∈ A (I), action a (I, a)=0, its strategy is initialized as δi(I, a)=1/ | A (I) |, when representing initial, the probability of each action is equal, adds up to 1, wherein: N is represent player limited Collection, LiRepresenting the information segmentation of player i, I is information collection, and σ is strategy group, and a is action.
  3. The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 2 and Revenue Reconciliation, It is characterized in that: in step 3, the value of information JiIChu:
    v i ( &sigma; , I ) = &Sigma; z &Element; Z I u i ( z ) &pi; - i &sigma; ( z &lsqb; I &rsqb; ) &pi; &sigma; ( z &lsqb; I &rsqb; , z ) - - - ( 4 )
    At information collection I, do not take the sorry value of action a:
    R i T ( I , a ) = 1 T &Sigma; t = 1 T ( v i ( I , &sigma; ( I &RightArrow; a ) t ) - v i ( I , &sigma; t ) ) - - - ( 5 )
    Wherein, z represents that in terminator sequence set, u (z) represent the actual utility value after arriving game final state, z [I] represents terminator sequence z display part on information collection I,Represent that all of opponent of player i arrives the general of z [I] Rate, πσ(z [I], z) is all players transition probability from historical series z [I] to z,Represent one and σtStrategy of equal value Group, except in information collection I, strategy groupAlways selection action a formula (5) calculating player i in T wheel iteration takes to move Make the average sorry value of a.
  4. The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 3 and Revenue Reconciliation, It is characterized in that: in step 4, previous step the valuation on each information collection having access to obtained is calculated according to regretting coupling Method, is each action partition density on each information collection again, obtains new strategy, thus obtain the plan preferential with income Slightly, for information collection I, obtain, by sorry coupling, the strategy that next step a income is preferential:
    Wherein, formula implication is: when cumulative Sorry value be timing, be normalized than upper total sorry value, proportional more New Policy, otherwise the iterative strategy of next round is i.e. For initial homogenization strategy, wherein R represents the sorry value that cumulative T takes turns, and a represents action, I representative information collection,I.e. For next round (T+1 wheel) at information collection I, the probability of player i employing action a.
  5. The minimum sorry appraisal procedure of non-perfect information game risk the most according to claim 4 and Revenue Reconciliation, It is characterized in that: in step 5, for the feature of non-complete information machine game, propose an approximation calculation risk loss Method, its basic thought is the average calculating the estimated revenue in sampling collection S, replaces the true earning of I in world collection W;
    Assuming that the world of current state is integrated as W by game person, unit's prime number is that the sampling of n, W integrates as S, unit's prime number be t, M be the institute of W Having a legal policy arrangement set, unit's prime number is k, first provides average yield computational methods now:
    Definition:For the average yield of sampling collection S, computational methods are as follows:
    E s &OverBar; = 1 t k &Sigma; i = 1 t &Sigma; j = 1 k E i j , ( i &Element; S , j &Element; M ) - - - ( 7 )
    Based on (7) formula, the integrated risk loss approximation computational methods formula for policy sequence δ is as follows:
    L W &sigma; = 1 n &Sigma; i = 1 n L w i &sigma; 2 = 1 n &Sigma; i = 1 n ( L w i I + L &sigma; I I ) 2 = 1 n &Sigma; i = 1 n ( E w i &sigma; - E I &sigma; + E I &sigma; - E I &sigma; &prime; ) 2 = 1 n &Sigma; i = 1 n ( E w i &sigma; - E I &sigma; &prime; ) 2 &ap; 1 t &Sigma; i = 1 t ( E w i &sigma; - E s &OverBar; ) 2 , ( w i &Element; S ) - - - ( 8 )
    (8), in formula, about equal sign institute junction is useAnd sampling collection S carries out the process of approximate calculation, based on top Method, calculates the value-at-risk of New Policy;
    Followed by the relation how considered between income and risk,
    Assume have tactful A, B, EAAnd EBRepresent game person's prospective earnings for strategy A, B, L respectivelyAAnd LBRepresent strategy A and B Risk of loss, then strategy A, B good and bad judgment rule as follows:
    1: if strategy A, B meet uA-LA>uB, then A is better than B, otherwise, if meeting uB-LB>uA, then B is better than A;
    2: otherwise, by following formula:
    R = l o g &lsqb; E A - ( E B - L B ) E B - ( E A - L A ) &rsqb; - - - ( 9 )
    If R>0, then A is better than B, if R<0, then B is better than A, if R=0, then AB etc. are excellent, and system can randomly choose;
    By above method, being ranked up the new and old strategy of current game person, the strategy of sequence optimum accounts for as current risk Dominant strategy, that is to say the optimal strategy of game person, and wherein, R represents risk, LAAnd LBRepresent the risk of loss of strategy A and B, uAWith uBRepresent the actual benefit value of strategy A and B.
CN201610658485.2A 2016-08-10 2016-08-10 The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation Pending CN106296006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610658485.2A CN106296006A (en) 2016-08-10 2016-08-10 The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610658485.2A CN106296006A (en) 2016-08-10 2016-08-10 The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation

Publications (1)

Publication Number Publication Date
CN106296006A true CN106296006A (en) 2017-01-04

Family

ID=57668611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610658485.2A Pending CN106296006A (en) 2016-08-10 2016-08-10 The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation

Country Status (1)

Country Link
CN (1) CN106296006A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829566A (en) * 2018-12-26 2019-05-31 中国人民解放军国防科技大学 Method for generating combat action sequence
CN110404265A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) A kind of non-complete information machine game method of more people based on game final phase of a chess game online resolution, device, system and storage medium
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110599051A (en) * 2019-09-19 2019-12-20 桂林电子科技大学 Sub-game perfect Nash balanced fetching method of two agents
CN110772798A (en) * 2019-10-23 2020-02-11 桂林电子科技大学 Method for searching Nash equilibrium sequence based on FIP structure
KR102133143B1 (en) * 2019-01-17 2020-07-13 알리바바 그룹 홀딩 리미티드 Strategic search in strategic interaction between parties
CN111905373A (en) * 2020-07-23 2020-11-10 深圳艾文哲思科技有限公司 Artificial intelligence decision method and system based on game theory and Nash equilibrium
WO2020227960A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of an execution device
WO2020227958A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of execution device
WO2020227954A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of an execution device
CN112041811A (en) * 2019-12-12 2020-12-04 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
CN112041875A (en) * 2019-12-12 2020-12-04 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
CN112149824A (en) * 2020-09-15 2020-12-29 支付宝(杭州)信息技术有限公司 Method and device for updating recommendation model by game theory
CN112639841A (en) * 2019-01-17 2021-04-09 创新先进技术有限公司 Sampling scheme for policy search in multi-party policy interaction
US11144841B2 (en) 2019-12-12 2021-10-12 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
US11157316B1 (en) 2020-04-02 2021-10-26 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
US11204803B2 (en) 2020-04-02 2021-12-21 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
CN114580642A (en) * 2022-03-17 2022-06-03 中国科学院自动化研究所 Method, device, equipment and medium for constructing game AI model and processing data

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829566A (en) * 2018-12-26 2019-05-31 中国人民解放军国防科技大学 Method for generating combat action sequence
CN112639841B (en) * 2019-01-17 2024-02-06 创新先进技术有限公司 Sampling scheme for policy searching in multiparty policy interactions
RU2743626C1 (en) * 2019-01-17 2021-02-20 Эдванст Нью Текнолоджиз Ко., Лтд. Strategy search in strategic interaction between parties
CN112639841A (en) * 2019-01-17 2021-04-09 创新先进技术有限公司 Sampling scheme for policy search in multi-party policy interaction
CN112292701A (en) * 2019-01-17 2021-01-29 创新先进技术有限公司 Conducting policy search in multi-party policy interaction
KR102133143B1 (en) * 2019-01-17 2020-07-13 알리바바 그룹 홀딩 리미티드 Strategic search in strategic interaction between parties
WO2020147075A1 (en) * 2019-01-17 2020-07-23 Alibaba Group Holding Limited Strategy searching in strategic interaction between parties
CN112292699A (en) * 2019-05-15 2021-01-29 创新先进技术有限公司 Determining action selection guidelines for an execution device
WO2020227960A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of an execution device
WO2020227958A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of execution device
WO2020227954A1 (en) * 2019-05-15 2020-11-19 Advanced New Technologies Co., Ltd. Determining action selection policies of an execution device
CN112292698A (en) * 2019-05-15 2021-01-29 创新先进技术有限公司 Determining action selection guidelines for an execution device
CN112292696A (en) * 2019-05-15 2021-01-29 创新先进技术有限公司 Determining action selection guidelines for an execution device
CN110404264B (en) * 2019-07-25 2022-11-01 哈尔滨工业大学(深圳) Multi-person non-complete information game strategy solving method, device and system based on virtual self-game and storage medium
CN110404265B (en) * 2019-07-25 2022-11-01 哈尔滨工业大学(深圳) Multi-user non-complete information machine game method, device and system based on game incomplete on-line resolving and storage medium
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110404265A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) A kind of non-complete information machine game method of more people based on game final phase of a chess game online resolution, device, system and storage medium
CN110599051A (en) * 2019-09-19 2019-12-20 桂林电子科技大学 Sub-game perfect Nash balanced fetching method of two agents
CN110772798A (en) * 2019-10-23 2020-02-11 桂林电子科技大学 Method for searching Nash equilibrium sequence based on FIP structure
US11113619B2 (en) 2019-12-12 2021-09-07 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
US11077368B2 (en) 2019-12-12 2021-08-03 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
US11144841B2 (en) 2019-12-12 2021-10-12 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
CN112041875B (en) * 2019-12-12 2022-04-22 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
CN112041811B (en) * 2019-12-12 2022-09-16 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
CN112041875A (en) * 2019-12-12 2020-12-04 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
CN112041811A (en) * 2019-12-12 2020-12-04 支付宝(杭州)信息技术有限公司 Determining action selection guidelines for an execution device
US11157316B1 (en) 2020-04-02 2021-10-26 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
US11204803B2 (en) 2020-04-02 2021-12-21 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device
CN111905373A (en) * 2020-07-23 2020-11-10 深圳艾文哲思科技有限公司 Artificial intelligence decision method and system based on game theory and Nash equilibrium
CN112149824A (en) * 2020-09-15 2020-12-29 支付宝(杭州)信息技术有限公司 Method and device for updating recommendation model by game theory
CN114580642A (en) * 2022-03-17 2022-06-03 中国科学院自动化研究所 Method, device, equipment and medium for constructing game AI model and processing data

Similar Documents

Publication Publication Date Title
CN106296006A (en) The minimum sorry appraisal procedure of non-perfect information game risk and Revenue Reconciliation
Glickman Dynamic paired comparison models with stochastic variances
CN106339582B (en) A kind of chess and card games automation final phase of a chess game generation method based on game playing by machine technology
CN103942461B (en) Water quality parameter Forecasting Methodology based on online online-sequential extreme learning machine
Cho et al. Using social network analysis and gradient boosting to develop a soccer win–lose prediction model
Wheatcroft A profitable model for predicting the over/under market in football
CN106169063A (en) A kind of method in automatic identification user&#39;s reading interest district
Karlis et al. On modelling soccer data
Spanias et al. Predicting the outcomes of tennis matches using a low-level point model
Baker et al. Optimal betting under parameter uncertainty: Improving the Kelly criterion
Dobson et al. Persistence in sequences of football match results: A Monte Carlo analysis
Caliwag et al. Predicting basketball results using cascading algorithm
Stekler et al. Predicting the outcomes of NCAA basketball championship games
Ye South Korea's free trade strategy and East Asian regionalism: A multistage approach
Sarkar et al. An online system for player-vs-level matchmaking in human computation games
Albert Streakiness in team performance
Everson et al. Composite Poisson models for goal scoring
Akhtar et al. Rating batters in test cricket
Goushehgir et al. Developing a set of key performance indicators for monitoring sustainability of forest functions in the Hyrcanian forests
Omidiran Penalized regression models for the NBA
Nishino et al. Parallel monte carlo search for imperfect information game daihinmin
CN113658681A (en) Decision tree-based drug abstinence personnel abstinence effect evaluation method
Klein-Soetebier et al. Match analysis in table tennis
Peltola Forecasting English Premier League season outcomes using expected goals-based Monte Carlo simulation
Traneva " Prof. Asen Zlatarov” University," Prof. Yakimov" Blvd, Bourgas 8000, Bulgaria veleka13@ gmail. com, tranev@ abv. bg

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170104