CN108809713A - Monte Carlo tree searching method based on optimal resource allocation algorithm - Google Patents

Monte Carlo tree searching method based on optimal resource allocation algorithm Download PDF

Info

Publication number
CN108809713A
CN108809713A CN201810593129.6A CN201810593129A CN108809713A CN 108809713 A CN108809713 A CN 108809713A CN 201810593129 A CN201810593129 A CN 201810593129A CN 108809713 A CN108809713 A CN 108809713A
Authority
CN
China
Prior art keywords
monte carlo
decision
decision scheme
carlo tree
scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810593129.6A
Other languages
Chinese (zh)
Other versions
CN108809713B (en
Inventor
陈子豪
李斌
李厚强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201810593129.6A priority Critical patent/CN108809713B/en
Publication of CN108809713A publication Critical patent/CN108809713A/en
Application granted granted Critical
Publication of CN108809713B publication Critical patent/CN108809713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of Monte Carlo tree searching method based on optimal resource allocation algorithm, only the selection strategy of the child node of root node in the tree of Monte Carlo is adjusted, optimal resource allocation algorithm is used to carry out the distribution of simulation calculation resource to the Monte Carlo subtree corresponding to each child node, and the searching method of the Monte Carlo tree corresponding to each child node, such as tree strategy etc., remain unchanged, this allows the method for the present invention to facilitate and combined with Monte Carlo tree searching method, simultaneously, Monte Carlo tree can also be improved and search for the decision performance under computing resource limited circumstances.The method of the present invention is suitable for the Monte Carlo tree searching method of all concrete forms, is with a wide range of applications.

Description

Monte Carlo tree searching method based on optimal resource allocation algorithm
Technical Field
The invention relates to the technical field of games, in particular to a Monte Carlo tree searching method based on an optimal resource allocation algorithm.
Background
The Markov Decision Process (MDP) models the sequential decision problem with known environment using a quadruple of { state set, action set, transfer model, reward function }. The complete decision process can be described by a sequence of { state, action } pairs. Where each next state s' is determined by a probability distribution that depends on the current state s and the chosen action a. The policy in MDP refers to the mapping from state space to action space, i.e. the rule to choose a specific action in each state. The goal of MDP is to find the strategy that maximizes the expected return. When the number of states in the environment is too large or difficult to know, the policy cannot be evaluated efficiently. One of the effective measures to solve this problem is to use a Monte Carlo Tree Search (MCTS) to evaluate the value function for each pair of state, action to replace the policy evaluation.
Monte carlo tree search is a method of finding the best decision in a given domain by randomly sampling in a decision space and building a search tree from the results. It has had a profound impact on Artificial Intelligence (AI), and in theory MCTS can be applied in any domain that can be described by { state, action } pairs and used to predict results through simulations. The interest in MCTS research has risen dramatically due to the great success MCTS has achieved in the game of Go (Go) and the potential applications to many other problems.
MCTS appeared as early as 1928, and John von Neumann proposed minimax theory to pave the way for the Adversarial Tree Search (adaptive Tree Search) method. Then, the Monte Carlo (Monte Carlo) method was formally used in the 40 th century as a method for dealing with a problem less suitable for the definition of tree search definition by random sampling. Finally, remi Coulomb combines the two methods in 2006 and proposes MCTS to provide a decision for the movement planning in Go.
Until now, MCTS has been extensively studied and many variant forms have emerged, such as belief upper bound trees (UCT), single-or multi-player MCTS, real-time MCTS, and so on. At the same time, the Tree Policy (Tree Policy) of MCTS is improved and enhanced, among other things. However, the monte carlo-based method has a common point that the nature of the problem faced needs to be counted through a large number of Simulation (Simulation) experiments. In the case of less computing resources, even in the face of moderate complexity problems, partially critical state nodes or action edges may not be accessible during the monte carlo tree search, which also leads to the difficulty that MCTS performs poorly with less computing resources.
Disclosure of Invention
The invention aims to provide a Monte Carlo tree searching method based on an optimal resource allocation algorithm, which can greatly improve the Monte Carlo tree searching performance under the condition of limited computing resources.
The purpose of the invention is realized by the following technical scheme:
a Monte Carlo tree searching method based on an optimal resource allocation algorithm comprises the following steps:
taking the initial state of the problem to be decided as the root node R of the Monte Carlo tree0If n actions exist in the corresponding action space, the root node R is formed0Each child node is used as a root node of a sub Monte Carlo tree, and each child node is used as a decision scheme of an optimal resource allocation algorithm;
allocating initial computing resources to each decision scheme, performing Monte Carlo tree search iterative computation of corresponding computing resource amount on the sub Monte Carlo trees corresponding to each decision scheme, and recording the benefit of each iteration;
judging the sum of the used computing resources of all the decision schemes after the first roundWhether it is not less than the maximum available computing resource T; wherein,representing the total computing resources of a decision scheme after the first round of computing resource allocation;
if not, increasing the computing resources delta, determining the actually available computing resource amount of each decision scheme in the (l + 1) th round of computation by using an optimal resource allocation algorithm according to the historical profit of each decision scheme, and executing the iterative computation same as the previous step;
if yes, the Monte Carlo tree searching process is ended, and therefore the action corresponding to the decision scheme with the best average performance is determined.
It can be seen from the above technical solutions that only the selection policy of the child nodes of the root node in the monte carlo tree is adjusted, that is, the optimal resource allocation algorithm is adopted to allocate the simulation computation resources to the monte carlo sub-trees corresponding to the child nodes, and the search methods of the monte carlo trees corresponding to the child nodes, such as tree policies and the like, are all kept unchanged, so that the method of the present invention can be conveniently combined with the monte carlo tree search method, and simultaneously, the decision performance of the monte carlo tree search under the condition of limited computation resources can be improved. The method is suitable for Monte Carlo tree searching methods in all specific forms, and has a wide application range.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a monte carlo tree search method based on an optimal resource allocation algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of monte carlo tree search based on an optimal resource allocation algorithm according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a process of performing monte carlo tree search on child nodes according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a Monte Carlo tree searching method based on an Optimal resource allocation (OCBA) algorithm, which aims at the problem that the Monte Carlo tree has poor decision performance under the condition of limited Computing resources.
The main process of the invention is shown in figure 1, which mainly comprises the following parts:
1. taking the initial state of the problem to be decided as the root node R of the Monte Carlo tree0If n actions exist in the corresponding action space, the root node R is formed0Each child node is used as a root node of a sub Monte Carlo tree, and each child node is used as a decision scheme of an optimal resource allocation algorithm.
In the embodiment of the invention, assuming that n actions exist in the corresponding action space, the actions are respectively executed and then transferred to the n actionsNew state, i.e. forming root node R0N child nodes of (1); each child node is used as a root node of one child Monte Carlo tree, so that n child Monte Carlo trees SMCTs which are independent of each other are totaliEach child node is used as a decision scheme theta of an optimal resource allocation algorithmi
2. And allocating initial computing resources to each decision scheme, performing Monte Carlo tree search iterative computation of corresponding computing resource amount on the sub Monte Carlo trees corresponding to each decision scheme, and recording the benefit of each iteration.
In the embodiment of the present invention, initially, that is, when l is equal to 0, initial calculation resources are allocated to each decision scheme, and meanwhile, a monte carlo tree search iterative calculation of a corresponding calculation resource amount is performed on a sub-monte carlo tree corresponding to each decision scheme.
For convenience of understanding, in the embodiment of the present invention, the computing resource may be regarded as the number of iterations of the monte carlo tree search; let l be 0 and make l be 0,for each sub-Monte Carlo tree SMCT corresponding to the decision schemeiAll carry out N0And (4) searching and iterating the computation by using the sub Monte Carlo tree, and recording the income of each iteration.
In fact, in different environments, computing resources may also be understood as computing time, storage space, and the like.
3. Judging the sum of the used computing resources of all the decision schemes after the first roundWhether it is not less than the maximum available computing resource T.
In the embodiment of the present invention, the first and second substrates,representing the total computational resources of a decision-making scheme after the first round of allocation of computational resources, i.e., the decision-making scheme in the first round andthe sum of the computational resources used for each round prior to the l-th round.
4. And (3) increasing computing resources delta, determining the total computing resource amount of the decision schemes from the 1 st to the l +1 st rounds in the 1 st and the l +1 st rounds of computing by utilizing an optimal resource allocation algorithm according to the historical income of the decision schemes, determining the actually available computing resource amount of the decision schemes in the l +1 st rounds, and executing the same Monte Carlo tree search iterative computation as that in the previous step 2.
In the embodiment of the invention, the optimal resource allocation algorithm is utilized to calculate the quantity of the historical income according to the mean value and the variance of the historical income of each decision schemeThe available total computing resources of each decision scheme are distributed to each decision scheme, and the computing resource amount obtained by each decision scheme in the (l + 1) th round is Andthe difference between the two will determine the amount of computing resources actually available for each scheme in the first +1 th round of simulation calculation.
In particular, θ is for all decision schemesiI ∈ I ═ {1,2, …, n }, noting that any suboptimal decision scheme is θjOne optimal decision scheme is θbThe other decision scheme is thetaxX ∈ X, where j, b ∈ I, j ≠ b, X ∈ X ═ I- { j, b }. Similarly, the symbols j, b, x are used as labels of various properties of the non-optimal decision scheme, the optimal decision scheme, and other decision schemes, respectively. Illustratively, j is 1, b is 2, then X ∈ X is {3,4,5, …, n }.
Then for all I e I ∈ I ═ {1,2, …, n }, j, b ∈ I, j ≠ b, X ∈ X ═ I- { j, b }, the following formula:
wherein:
in the above formula, the first and second carbon atoms are,respectively represent non-optimal decision schemes thetajOptimal decision plan thetabOther decision schemes thetaxComputing resource amount obtained in round (l + 1); n represents the total calculation resources of the decision schemes corresponding to the corresponding subscripts after the resources are allocated in the rounds indicated by the corresponding subscripts; mu.ski) Represents a decision scheme θiThe gain at the time of the k-th calculation, a flag representing the decision scheme with the highest average historical benefit after the first search iteration; μ represents the mean of the historical returns, δ represents the variance of the historical returns, the superscript l is the number of rounds, the subscript is the label for various property decision schemes, e.g.,to a decision scheme thetaiCalculating the mean value and the variance of the historical income in the 1 st to the l-th round;andare all intermediate parameters, wherein,andmay be other schemes thetaxAnd the optimum solution thetabWith respect to the selected non-optimal decision scheme θjThe scaling factor of the total computational resource amount obtained in the first round. Assume decision scheme θjThe resulting computing resource is one unit of quantity. For theFrom this equation, it can be seen that if the other decision scheme θ is usedxThe larger the historical profit mean (better performance) and the larger the variance (indicating uncertain performance, requiring more calculations to determine true performance), thenThe larger the value of (a), this indicates that the other decision scheme θ is to be assignedxThe more computation is performed.
Bonding ofAndthe difference between the first and second decision schemes determines the actual available computing resource amount of each decision scheme during the (l + 1) th round of computation:i belongs to I; that is, for each sub-Monte Carlo tree SMCT corresponding to the decision schemeiAll perform the calculation of the resource amount ofSearching and iterating the Monte Carlo tree; the total computing resources after the computing resources are allocated in the (l + 1) th round of each decision scheme are:
and (4) after the steps are executed, judging in a step (3), if the judgment result is negative, continuing to execute the step until the judgment result is positive, and then, executing a step (5).
5. The monte carlo tree search process is ended to determine the action corresponding to the decision scheme that performs best on average.
After the monte carlo tree search is finished, the action corresponding to the decision scheme can be selected through the average performance.
In the above-mentioned solution of the embodiment of the present invention, only the selection policy of the child nodes of the root node in the monte carlo tree is acted, that is, the optimal resource allocation algorithm is adopted to perform the allocation of the simulation computation resource to the monte carlo sub-tree corresponding to each child node, and the search method of the monte carlo tree corresponding to each child node, such as the tree policy and the like, is kept unchanged, so that the method of the present invention can be conveniently combined with the search method of the monte carlo tree, and at the same time, the decision performance of the monte carlo tree search under the condition of limited computation resource can be improved. The method is suitable for Monte Carlo tree searching methods in all specific forms, and has a wide application range.
For ease of understanding, the following description is made in connection with an example.
The technical scheme of the embodiment of the invention can be suitable for Monte Carlo tree searching methods in all specific forms. In this example, the question of falling chess in the play of black and white chess is used as a research object, the specific form of Monte Carlo tree search is the confidence upper limit tree (UCT), and then the root node R in the UCT0The chessboard state is the chessboard state to be dropped, the action space is all positions where the player can drop in the current chessboard state, each action corresponds to one dropping position, and n dropping actions are total.
Each child node of the root node is performing action aiThe chessboard changes to a new state after falling. Each child node is used as a new root node to conduct UCT search, so that a new Monte Carlo tree SMCT is generatediI.e. the above-mentioned node R0A subtree of the monte carlo tree as a root node.
In the playing process of the black and white chess, if the result of the simulation calculation after the step of the; if the result is negative, the profit is marked as 0; otherwise, the benefit is noted as 0.5.
And the mean and the variance of all simulation calculation results of each child node are used as the input of an optimal resource allocation algorithm to calculate the calculation resources of each decision scheme in the next round.
In this example, the computing resource is a sub-Monte Carlo Tree SMCTiThe number of iterations or simulation for performing the UCT search is shown in fig. 2 as an MCTS search process based on the most resource allocation algorithm, and shown in fig. 3 as an iteration process for performing the monte carlo tree search on child nodes.
And after the whole method is executed, returning to the optimal action of the current chessboard in the playing process.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. A Monte Carlo tree searching method based on an optimal resource allocation algorithm is characterized by comprising the following steps:
taking the initial state of the problem to be decided as the root node R of the Monte Carlo tree0If n actions exist in the corresponding action space, the root node R is formed0Each child node is used as a root node of a sub Monte Carlo tree, and each child node is used as a decision scheme of an optimal resource allocation algorithm;
allocating initial computing resources to each decision scheme, performing Monte Carlo tree search iterative computation of corresponding computing resource amount on the sub Monte Carlo trees corresponding to each decision scheme, and recording the benefit of each iteration;
judging the sum of the used computing resources of all the decision schemes after the first roundWhether it is not less than the maximum available computing resource T; wherein,representing the total computing resources of a decision scheme after the first round of computing resource allocation;
if not, increasing the computing resources delta, determining the actually available computing resource amount of each decision scheme in the (l + 1) th round of computation by using an optimal resource allocation algorithm according to the historical profit of each decision scheme, and executing the iterative computation same as the previous step;
if yes, the Monte Carlo tree searching process is ended, and therefore the action corresponding to the decision scheme with the best average performance is determined.
2. The method of claim 1, wherein the method comprises performing n actions to transfer to n new states, i.e. forming a root node R0N child nodes of (1);
each child node is used as a root node of one child Monte Carlo tree, so that n child Monte Carlo trees SMCTs which are independent of each other are totaliEach child node is used as a decision scheme theta of an optimal resource allocation algorithmi
3. The method of claim 1, wherein the Monte Carlo tree search method based on the optimal resource allocation algorithm,
initially, initial computing resources are allocated for each decision schemeThat is to say that the first and second electrodes,
for each sub-Monte Carlo tree SMCT corresponding to the decision schemeiAll perform calculation with resource amount N0The search iteration of the Monte Carlo tree is calculated, and the income of each iteration is recorded.
4. The method as claimed in claim 1, wherein the total calculation resource amount of each decision scheme in 1 to 1 +1 round of calculation is determined by the optimal resource allocation algorithm according to the historical profit of each decision schemeDetermining therefrom the amount of computational resources actually available for each decision-making scheme in round i +1 includes:
the optimal resource allocation algorithm is utilized to obtain the quantity of the historical income according to the mean value and the variance of each decision schemeThe available total computing resources of (a) are allocated to each decision scheme, and each decision scheme obtains the amount of computing resources of
Recording any non-optimal decision scheme as thetajThe optimal scheme is thetabThe other decision scheme is thetaiThen, for all I ∈ I ═ {1, 2., n }, j, b ∈ I, j ≠ b, X ∈ X ═ I- { j, b }, the following formula:
wherein:
in the above formula, the first and second carbon atoms are,respectively represent non-optimal decision schemes thetajOptimal decision plan thetabOther decision schemes thetaxComputing resource amount obtained in round (l + 1); n represents the total calculation resources of the decision schemes corresponding to the corresponding subscripts after the resources are allocated in the rounds indicated by the corresponding subscripts; mu.ski) Represents a decision scheme θiThe gain at the time of the k-th calculation, a flag representing the decision scheme with the highest average historical benefit after the first search iteration; mu represents the mean value of the historical income, delta represents the variance of the historical income, the upper label l is the serial number of the round, and the lower label is the label of various property decision schemes;andare all intermediate parameters;
bonding ofAndthe difference between the two is used for determining the actually available computing resource amount of each decision scheme during the l +1 round of computation:i belongs to the E; that is, for each sub-Monte Carlo tree SMCT corresponding to the decision schemeiAll perform the calculation of the resource amount ofSearching and iterating the Monte Carlo tree; the total computing resources after the computing resources are allocated in the (l + 1) th round of each decision scheme are:
CN201810593129.6A 2018-06-08 2018-06-08 Monte Carlo tree searching method based on optimal resource allocation algorithm Active CN108809713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810593129.6A CN108809713B (en) 2018-06-08 2018-06-08 Monte Carlo tree searching method based on optimal resource allocation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810593129.6A CN108809713B (en) 2018-06-08 2018-06-08 Monte Carlo tree searching method based on optimal resource allocation algorithm

Publications (2)

Publication Number Publication Date
CN108809713A true CN108809713A (en) 2018-11-13
CN108809713B CN108809713B (en) 2020-12-25

Family

ID=64088186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810593129.6A Active CN108809713B (en) 2018-06-08 2018-06-08 Monte Carlo tree searching method based on optimal resource allocation algorithm

Country Status (1)

Country Link
CN (1) CN108809713B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859532A (en) * 2019-02-28 2019-06-07 深圳市北斗智能科技有限公司 A kind of the break indices method and relevant apparatus of multi-constraint condition
CN110209770A (en) * 2019-06-03 2019-09-06 北京邮电大学 A kind of name entity recognition method based on policy value network and tree search enhancing
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree
CN112202514A (en) * 2020-10-09 2021-01-08 中国人民解放军国防科技大学 Broadband spectrum sensing method based on reinforcement learning
CN112700005A (en) * 2020-12-28 2021-04-23 北京环境特性研究所 Abnormal event processing method and device based on Monte Carlo tree search
CN112734312A (en) * 2021-03-31 2021-04-30 平安科技(深圳)有限公司 Method for outputting reference data and computer equipment
CN113935618A (en) * 2021-10-12 2022-01-14 网易有道信息技术(江苏)有限公司 Evaluation method and device for chess playing capability, electronic equipment and storage medium
CN114492910A (en) * 2021-11-03 2022-05-13 北京科技大学 Resource load prediction method for multi-model small-batch production line

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204412A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Optimal policy determination using repeated stackelberg games with unknown player preferences
CN104135769A (en) * 2014-07-01 2014-11-05 宁波大学 Method of OFDMA (Orthogonal Frequency Division Multiple Access) ergodic capacity maximized resource allocation under incomplete channel state information
CN105727550A (en) * 2016-01-27 2016-07-06 安徽大学 Dot-grid chess game system based on UCT algorithm
WO2016123213A1 (en) * 2015-01-30 2016-08-04 Alcatel-Lucent Usa Inc. Frequency resource and/or modulation and coding scheme indicator for machine type communication device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204412A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Optimal policy determination using repeated stackelberg games with unknown player preferences
CN104135769A (en) * 2014-07-01 2014-11-05 宁波大学 Method of OFDMA (Orthogonal Frequency Division Multiple Access) ergodic capacity maximized resource allocation under incomplete channel state information
WO2016123213A1 (en) * 2015-01-30 2016-08-04 Alcatel-Lucent Usa Inc. Frequency resource and/or modulation and coding scheme indicator for machine type communication device
CN105727550A (en) * 2016-01-27 2016-07-06 安徽大学 Dot-grid chess game system based on UCT algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GAO LIN等: "Research on Resource Allocation Evaluation of Collaborative Product Developmet for Cloud Manufacuturing", 《THE 2015 INTERNATIONAL COFERENCE ON ADVANCES IN CONSTRUCTION MACHINERY AND VEHICLE ENGINEERING》 *
QUN MENG等: "Enhancing pattern search for global optimization with an additive global and local Gaussian Process Model", 《2017 WINTER SIMULATION COFERENCE》 *
YUNCHUAN LI等: "Monte Carlo tree search with optimal computing budget allocation", 《2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL》 *
刘洋: "点格棋博弈中UCT算法的研究与实现", 《中国优秀硕士学位论文全文数据库》 *
朱怡桦: "运用OCBA法改善求解随机性专案网路最佳化资源分配问题之研究", 《HTTPS://ETD.LIB.NCTU.EDU.TW/CGI-BIN/GS32/TUGSWEB.CGI?O=DNCTUCDR&S=ID=%22GT079832534%22.&SEARCHMODE=BASIC》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859532A (en) * 2019-02-28 2019-06-07 深圳市北斗智能科技有限公司 A kind of the break indices method and relevant apparatus of multi-constraint condition
CN110209770B (en) * 2019-06-03 2022-04-15 北京邮电大学 Named entity identification method based on strategy value network and tree search enhancement
CN110209770A (en) * 2019-06-03 2019-09-06 北京邮电大学 A kind of name entity recognition method based on policy value network and tree search enhancing
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree
CN112202514A (en) * 2020-10-09 2021-01-08 中国人民解放军国防科技大学 Broadband spectrum sensing method based on reinforcement learning
CN112202514B (en) * 2020-10-09 2022-11-08 中国人民解放军国防科技大学 Broadband spectrum sensing method based on reinforcement learning
CN112700005A (en) * 2020-12-28 2021-04-23 北京环境特性研究所 Abnormal event processing method and device based on Monte Carlo tree search
CN112700005B (en) * 2020-12-28 2024-02-23 北京环境特性研究所 Abnormal event processing method and device based on Monte Carlo tree search
CN112734312B (en) * 2021-03-31 2021-07-09 平安科技(深圳)有限公司 Method for outputting reference data and computer equipment
CN112734312A (en) * 2021-03-31 2021-04-30 平安科技(深圳)有限公司 Method for outputting reference data and computer equipment
CN113935618A (en) * 2021-10-12 2022-01-14 网易有道信息技术(江苏)有限公司 Evaluation method and device for chess playing capability, electronic equipment and storage medium
CN114492910A (en) * 2021-11-03 2022-05-13 北京科技大学 Resource load prediction method for multi-model small-batch production line
CN114492910B (en) * 2021-11-03 2023-11-14 北京科技大学 Resource load prediction method for multi-model small-batch production line

Also Published As

Publication number Publication date
CN108809713B (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN108809713B (en) Monte Carlo tree searching method based on optimal resource allocation algorithm
Bitsakos et al. DERP: A deep reinforcement learning cloud system for elastic resource provisioning
CN110570111A (en) Enterprise risk prediction method, model training method, device and equipment
CN110138612A (en) A kind of cloud software service resource allocation methods based on QoS model self-correcting
Barrera et al. A review of particle swarm optimization methods used for multimodal optimization
CN106411896A (en) APDE-RBF neural network based network security situation prediction method
US20130262453A1 (en) Estimating Thread Participant Expertise Using A Competition-Based Model
Moradi et al. Automatic skill acquisition in reinforcement learning using graph centrality measures
Villatoro et al. Robust convention emergence in social networks through self-reinforcing structures dissolution
Albrecht et al. Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems
CN112436992A (en) Virtual network mapping method and device based on graph convolution network
CN113269652A (en) Hypergraph influence propagation method and influence maximization method based on crowd psychology
Pavlenko et al. Criterion of cyber-physical systems sustainability
CN113724096A (en) Group knowledge sharing method based on public commodity evolution game model
CN109831343B (en) Peer-to-peer network cooperation promotion method and system based on past strategy
Xie et al. Cloud computing resource scheduling based on improved differential evolution ant colony algorithm
JP7382045B1 (en) Multi-agent self-organizing demand response method and system using nested federated learning
Sapin et al. A novel ea-based memetic approach for efficiently mapping complex fitness landscapes
Xu et al. Improving quantal cognitive hierarchy model through iterative population learning
Yu et al. Evolutionary analysis on online social networks using a social evolutionary game
Santos et al. Evolution of Equity Norms in Small‐World Networks
Zhang et al. Designing social norm based incentive schemes to sustain cooperation in a large community
Tomášek et al. Using one-sided partially observable stochastic games for solving zero-sum security games with sequential attacks
Elomda et al. MCDM method based on improved fuzzy decision map
CN116703108B (en) Crowd-sourcing problem selection method and system based on top-k structure hole

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant