JP2014142849A

JP2014142849A - Solution search device, solution search method and solution search program

Info

Publication number: JP2014142849A
Application number: JP2013011629A
Authority: JP
Inventors: Takashi Shiraki; 孝白木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-01-25
Filing date: 2013-01-25
Publication date: 2014-08-07

Abstract

PROBLEM TO BE SOLVED: To provide a solution search device capable of calculating a solution within a designated calculation time while reducing memory usage in a solution search using a simulation, a solution search method and a solution search program.SOLUTION: The solution search device includes a contraction part 101 for removing a node from a search tree when the node that does not have a plurality of child nodes in the search tree exists in the search tree in solution search using a simulation, and executing contraction processing for connecting child nodes to a master node of the child nodes when the removed node has the child nodes.

Description

本発明は、最適化計算などにおける解探索に適用される解探索装置、解探索方法および解探索プログラムに関する。 The present invention relates to a solution search apparatus, a solution search method, and a solution search program applied to solution search in optimization calculation and the like.

最適化問題は、目的関数と制約条件が設定され、目的関数が最善となる最適解一つを導出する問題であることが多い。人工知能の分野では、非特許文献１に記載されたＭＣＴＳ（モンテカルロ木探索（Ｍｏｎｔｅ−ＣａｒｌｏＴｒｅｅＳｅａｒｃｈ））など、シミュレーションを用いた探索方法が注目されている。それらの探索方法は、データマイニングや機械学習の分野で注目されているＭＢＰ（Ｍｕｌｔｉ−ａｒｍｅｄＢａｎｄｉｔＰｒｏｂｌｅｍ）を解く方法の発展形として位置付けられる。また、それらの探索方法は、実用化されつつあり、既に実用化に成功した例として、コンピュータ囲碁がある。ＯＲ（ＯｐｅｒａｔｉｏｎｓＲｅｓｅａｒｃｈ）などで用いられる最適化においては、それらの探索方法の実用化が期待されているが、実用化が難しい。 The optimization problem is often a problem in which an objective function and constraint conditions are set, and an optimal solution with the best objective function is derived. In the field of artificial intelligence, a search method using simulation, such as MCTS (Monte-Carlo Tree Search) described in Non-Patent Document 1, is attracting attention. These search methods are positioned as an advanced form of a method for solving MBP (Multi-armed Bandit Problem), which is attracting attention in the fields of data mining and machine learning. Moreover, those search methods are being put into practical use, and computer Go is an example that has already been successfully put into practical use. In optimization used in OR (Operations Research) or the like, practical use of these search methods is expected, but practical application is difficult.

コンピュータ囲碁と最適化との最も異なる点は、解空間の木（以下、解空間木という。）において、コンピュータ囲碁などは解空間木の各段において次の段の中で最も良いノードを探すことを目的とするのに対して、最適化では最下段の解ノードの中で最も良いノードを探すことを目的とすることにある。解空間木の最下段まで探索木を伸ばすことは、ＭＣＴＳにおいて今までにない目的となる。 The most different point between computer go and optimization is that in the solution space tree (hereinafter referred to as the solution space tree), computer go etc. finds the best node in the next step in each step of the solution space tree. The purpose of optimization is to find the best node among the solution nodes at the lowest level. Extending the search tree to the bottom of the solution space tree is an unprecedented goal in MCTS.

探索木の大きさを抑え、かつ与えられた計算時間以内に解空間木の底に達するように、枝刈（Ｐｒｕｎｉｎｇ）をする技術がある。枝刈により、計算時間が可能な範囲で適切に探索範囲を狭めることができるので、シミュレーション結果によって重要と判断したノードのシミュレーション回数を増やすことができる。従って、シミュレーションによって評価値を算出するＭＣＴＳなどの探索方法の解の精度を改善する可能性を高めることができる。 There is a technique of pruning so as to reduce the size of the search tree and reach the bottom of the solution space tree within a given calculation time. By pruning, the search range can be appropriately narrowed within a range where calculation time is possible, so that the number of simulations of the node determined to be important based on the simulation result can be increased. Therefore, it is possible to increase the possibility of improving the accuracy of the solution of a search method such as MCTS that calculates an evaluation value by simulation.

Ｃ．Ｂｒｏｗｎｅ，Ｅ．Ｐｏｗｌｅｙ，Ｄ．Ｗｈｉｔｅｈｏｕｓｅ，Ｓ．Ｌｕｃａｓ，Ｐ．Ｉ．Ｃｏｗｌｉｎｇ，Ｐ．Ｒｏｈｌｆｓｈａｇｅｎ，Ｓ．Ｔｒａｖｅｎｅｒ，Ｄ．Ｐｅｒｅｚ，Ｓ．ＳａｍｏｔｈｒａｋｉｓａｎｄＳ．Ｃｏｌｔｏｎ，ＡＳｕｒｖｅｙｏｆＭｏｎｔｅＣａｒｌｏＴｒｅｅＳｅａｒｃｈＭｅｔｈｏｄｓ，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＣｏｍｐｕｔａｔｉｏｎａｌＩｎｔｅｌｌｉｇｅｎｃｅａｎｄＡＩｉｎＧａｍｅｓ，Ｖｏｌ．４，Ｎｏ．１，Ｍａｒｃｈ２０１２.C. Browne, E .; Powley, D.W. Whitehouse, S.W. Lucas, P.M. I. Cowling, P.M. Rolfshagen, S .; Travener, D.M. Perez, S.M. Samothrakis and S. Colton, A Survey of Monto, Carlo Tree Search Methods, IEEE Transactions on Computational Intelligence and AI in Games, Vol. 4, no. 1, March 2012. Ｐ．Ａｕｅｒ，Ｎ．Ｃｅｓａ−Ｂｉａｎｃｈｉ，ａｎｄＰ．Ｆｉｓｃｈｅｒ，Ｆｉｎｉｔｅ−ｔｉｍｅＡｎａｌｙｓｉｓｏｆｔｈｅＭｕｌｔｉａｒｍｅｄＢａｎｄｉｔＰｒｏｂｌｅｍ，ＭａｃｈｉｎｅＬｅａｒｎｉｎｇ，Ｖｏｌ．４７，ｐ．２３５−２５６，２００２P. Auer, N.A. Cesa-Bianchi, and P.M. Fischer, Fine-time Analysis of the Multi-banded Problem, Machine Learning, Vol. 47, p. 235-256, 2002

ＭＣＴＳのように底ノードを探索することを目的とする解探索における木において、展開先が１ノードしかない時、または枝刈によって子ノードが１以下となった時に、分岐を持たないノードが発生する。探索木において、分岐を持たないノードを辿ることは無駄となる。 In a tree in a solution search aiming to search the bottom node like MCTS, when there is only one node for expansion or when the child node becomes 1 or less by pruning, a node without a branch is generated To do. In the search tree, it is useless to follow a node having no branch.

そこで、本発明は、シミュレーションを用いた解探索において、メモリ使用量を低減しつつ、指定された計算時間以内に解を算出することができる解探索装置、解探索方法および解探索プログラムを提供することを目的とする。 Therefore, the present invention provides a solution search apparatus, a solution search method, and a solution search program that can calculate a solution within a specified calculation time while reducing memory usage in solution search using simulation. For the purpose.

本発明による解探索装置は、シミュレーションを用いた解探索において、探索木中に子ノードを複数持たないノードが存在する場合には、当該ノードを探索木から取り除き、取り除かれたノードが子ノードを持つ場合には、当該子ノードを当該ノードの親ノードに接続する収縮処理を実行する収縮部を含むことを特徴とする。 In the solution search apparatus according to the present invention, in a solution search using simulation, if there is a node that does not have a plurality of child nodes in the search tree, the node is removed from the search tree, and the removed node indicates a child node. In the case of having, a contraction unit that executes contraction processing for connecting the child node to the parent node of the node is included.

本発明による解探索方法は、シミュレーションを用いた解探索において、探索木中に子ノードを複数持たないノードが存在する場合には、当該ノードを探索木から取り除き、取り除かれたノードが子ノードを持つ場合には、当該子ノードを当該ノードの親ノードに接続することを特徴とする。 In the solution search method according to the present invention, in a solution search using simulation, when a node having no plurality of child nodes exists in the search tree, the node is removed from the search tree, and the removed node is replaced with a child node. If so, the child node is connected to the parent node of the node.

本発明による解探索プログラムは、コンピュータに、シミュレーションを用いた解探索において、探索木中に子ノードを複数持たないノードが存在する場合には、当該ノードを探索木から取り除き、取り除かれたノードが子ノードを持つ場合には、当該子ノードを当該ノードの親ノードに接続する処理を実行させることを特徴とする。 In the solution search program according to the present invention, when there is a node that does not have a plurality of child nodes in the search tree in the solution search using simulation, the node is removed from the search tree, and the removed node is When there is a child node, a process of connecting the child node to the parent node of the node is executed.

本発明によれば、シミュレーションを用いた解探索において、メモリ使用量を低減しつつ、指定された計算時間以内に解を算出することができる。 According to the present invention, in a solution search using simulation, a solution can be calculated within a designated calculation time while reducing memory usage.

最適化システムの第１の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of 1st Embodiment of an optimization system. 第１の実施形態における計算部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calculation part in 1st Embodiment. 第１の実施形態において探索木が収縮される様子を示す説明図である。It is explanatory drawing which shows a mode that the search tree is shrunk | reduced in 1st Embodiment. 枝刈部を２つ含む最適化装置を備えた最適化システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the optimization system provided with the optimization apparatus containing two pruning parts. 第２の実施形態において探索木が収縮される様子を示す説明図である。It is explanatory drawing which shows a mode that the search tree is shrunk | reduced in 2nd Embodiment. 本発明による解探索装置の最小構成を示すブロック図である。It is a block diagram which shows the minimum structure of the solution search apparatus by this invention. 本発明による解探索装置の他の最小構成を示すブロック図である。It is a block diagram which shows the other minimum structure of the solution search apparatus by this invention.

実施形態１．
以下、本発明の第１の実施形態を図面を参照して説明する。 Embodiment 1. FIG.
A first embodiment of the present invention will be described below with reference to the drawings.

図１は、最適化システムの第１の実施形態の構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of the first embodiment of the optimization system.

図１に示すように、第１の実施形態における最適化システムは、ユーザ端末１と、最適化装置２とを備える。ユーザ端末１と最適化装置２とは、通信可能に接続される。なお、図１には１つのユーザ端末が例示されているが、ユーザ端末は最適化装置２にいくつ接続されていてもよい。 As shown in FIG. 1, the optimization system in the first embodiment includes a user terminal 1 and an optimization device 2. The user terminal 1 and the optimization device 2 are connected so as to communicate with each other. Although one user terminal is illustrated in FIG. 1, any number of user terminals may be connected to the optimization device 2.

ユーザ端末１は、例えばパーソナルコンピュータ等の情報処理端末である。ユーザ端末１は、操作部１１と、表示部１２とを含む。 The user terminal 1 is an information processing terminal such as a personal computer. The user terminal 1 includes an operation unit 11 and a display unit 12.

操作部１１は、実行する最適化計算に必要な情報（以下、最適化計算入力情報という。）を入力する。また、操作部１１は、実行指示を入力する。操作部１１は、最適化計算入力情報とともに実行指示を最適化装置２に出力する。 The operation unit 11 inputs information necessary for the optimization calculation to be executed (hereinafter referred to as optimization calculation input information). In addition, the operation unit 11 inputs an execution instruction. The operation unit 11 outputs an execution instruction to the optimization device 2 together with the optimization calculation input information.

表示部１２は、最適化装置２から最適化計算結果の解を受け取り、表示する。 The display unit 12 receives the solution of the optimization calculation result from the optimization device 2 and displays it.

最適化装置２は、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）部２１と、計算部２２と、記憶部２３とを含む。 The optimization apparatus 2 includes a GUI (Graphical User Interface) unit 21, a calculation unit 22, and a storage unit 23.

ＧＵＩ部２１は、ユーザ端末１の操作部１１から、最適化計算入力情報を受け取る。ＧＵＩ部２１は、最適化計算入力情報を計算部２２に伝達する。ＧＵＩ部２１は、計算部２２から最適化計算結果の解を受け取り、ユーザ端末１の表示部１２へ伝達する。 The GUI unit 21 receives optimization calculation input information from the operation unit 11 of the user terminal 1. The GUI unit 21 transmits optimization calculation input information to the calculation unit 22. The GUI unit 21 receives the solution of the optimization calculation result from the calculation unit 22 and transmits it to the display unit 12 of the user terminal 1.

計算部２２は、選択部２２１と、拡大部２２２と、シミュレーション部２２３と、評価値更新部２２４と、枝刈部２２５と、収縮（Ｓｈｒｉｎｋａｇｅ）部２２６とを含む。 The calculation unit 22 includes a selection unit 221, an enlargement unit 222, a simulation unit 223, an evaluation value update unit 224, a pruning unit 225, and a shrinkage unit 226.

選択部２２１は、展開されたノードの中からプレイアウトの実行対象となるノードを選択する。以下、プレイアウトの実行対象となるノードを選択ノードという。 The selection unit 221 selects a node to be played out from among the expanded nodes. Hereinafter, a node that is a playout execution target is referred to as a selection node.

拡大部２２２は、探索木（ツリー）を拡大する。具体的には、拡大部２２２は、予め定められた基準に従って、選択部２２１で選ばれたノードを展開する必要があるか否かを判断し、必要となればさらに一段下位にノードを展開して、探索木を拡大する。拡大部２２２は、ノードを展開した場合には、選択ノードを当該一段下位のノードに選択し直す。 The expansion unit 222 expands the search tree (tree). Specifically, the enlargement unit 222 determines whether it is necessary to expand the node selected by the selection unit 221 according to a predetermined criterion, and expands the node further one step lower if necessary. To expand the search tree. When expanding the node, the enlarging unit 222 reselects the selected node as the next lower node.

シミュレーション部２２３は、シミュレーションを実行する。具体的には、シミュレーション部２２３は、プレイアウト、すなわち、ランダムシミュレーションのような単純な方法で１つの解を探索し、解の評価値を取得する。 The simulation unit 223 executes a simulation. Specifically, the simulation unit 223 searches for one solution by a simple method such as playout, that is, random simulation, and acquires an evaluation value of the solution.

評価値更新部２２４は、シミュレーション部２２３が行ったプレイアウトの結果により、各ノードの解の評価値を更新する。具体的には、記憶部２３が記憶する各ノードの評価値を更新する。各ノードの評価値は、繰り返し実行されるシミュレーションで得られた評価値を集めた統計値からなり、評価値更新部２２４は当該統計値を更新する。本実施形態では、評価値更新部２２４は、ｂｅｓｔ，ｍｅａｎ，ｋｂｅｓｔの評価関数値（指標値）を、解の評価値として算出する。 The evaluation value update unit 224 updates the evaluation value of the solution of each node according to the result of the playout performed by the simulation unit 223. Specifically, the evaluation value of each node stored in the storage unit 23 is updated. The evaluation value of each node includes a statistical value obtained by collecting evaluation values obtained by repeated simulations, and the evaluation value update unit 224 updates the statistical value. In the present embodiment, the evaluation value update unit 224 calculates the evaluation function values (index values) of best, mean, and kbest as the evaluation value of the solution.

枝刈部２２５は、枝刈を実行する。具体的には、枝刈部２２５は、メモリ使用量の削減や、与えられた計算時間内に解空間木の底に探索を導くために、評価値の良くないノードと、当該ノードを接続する枝（エッジ）を取り除く。 The pruning unit 225 performs pruning. Specifically, the pruning unit 225 connects a node with a poor evaluation value to the node in order to reduce the memory usage or guide the search to the bottom of the solution space tree within a given calculation time. Remove branches (edges).

収縮部２２６は、枝刈などにより分岐がなくなったノードを探索木から取り除く。 The contraction unit 226 removes nodes from which no branching has occurred due to pruning or the like from the search tree.

記憶部２３は、目的関数や制約条件を記憶する。最適化システムがスケジューリング問題に適用される場合には、記憶部２３は、タスク情報や担当者情報など、問題を解くために必要なデータ（以下、問題データという。）を記憶する。また、記憶部２３は、計算部２２での計算処理が進む際に、ノードの評価値など、変化する情報を記憶する。本実施形態では、記憶部２３は、計算部２２が各計算途中で得たノードの探索回数や評価値を記憶する。また、記憶部２３は、計算部２２で求められた解の中で保持する必要がある解を記憶する。 The storage unit 23 stores an objective function and constraint conditions. When the optimization system is applied to a scheduling problem, the storage unit 23 stores data necessary for solving the problem, such as task information and person-in-charge information (hereinafter referred to as problem data). In addition, the storage unit 23 stores information that changes such as an evaluation value of a node when the calculation process in the calculation unit 22 proceeds. In the present embodiment, the storage unit 23 stores the number of node searches and evaluation values obtained by the calculation unit 22 during each calculation. The storage unit 23 stores a solution that needs to be held among the solutions obtained by the calculation unit 22.

なお、ＧＵＩ部２１、計算部２２は、例えば、解探索プログラムに従って動作するコンピュータによって実現される。この場合、最適化装置２が備えるＣＰＵが解探索プログラムを読み込み、そのプログラムに従って、ＧＵＩ部２１および計算部２２として動作する。また、ＧＵＩ部２１および計算部２２の各部は別々のハードウェアで実現されていてもよい。 The GUI unit 21 and the calculation unit 22 are realized by a computer that operates according to a solution search program, for example. In this case, the CPU provided in the optimization apparatus 2 reads the solution search program and operates as the GUI unit 21 and the calculation unit 22 according to the program. Moreover, each part of the GUI part 21 and the calculation part 22 may be implement | achieved by separate hardware.

また、記憶部２３は、最適化装置２が備えるＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶装置によって実現される。 The storage unit 23 is realized by a storage device such as a RAM (Random Access Memory) included in the optimization device 2.

次に、本実施形態の動作を説明する。 Next, the operation of this embodiment will be described.

図２は、第１の実施形態における計算部２２の動作を示すフローチャートである。 FIG. 2 is a flowchart showing the operation of the calculation unit 22 in the first embodiment.

ここでは、図１に示す最適化システムがスケジューリング問題に適用される場合を例にする。 Here, a case where the optimization system shown in FIG. 1 is applied to a scheduling problem is taken as an example.

まず、ユーザがユーザ端末１の操作部１１に対して、最適化計算入力情報を入力する。ユーザは、最適化計算をしたいタスク、従事可能な担当者、各担当者がそれぞれのタスクに従事した時のコストや有効性などの問題データを最適化計算入力情報として入力する。このとき、ユーザは、最適化計算入力情報とともに実行指示を操作部１１に入力する。操作部１１は、最適化計算入力情報と実行指示とを最適化装置２に出力する。 First, the user inputs optimization calculation input information to the operation unit 11 of the user terminal 1. A user inputs problem data such as a task for which optimization calculation is desired, a person in charge who can be engaged, and cost and effectiveness when each person in charge engages in each task as optimization calculation input information. At this time, the user inputs an execution instruction to the operation unit 11 together with the optimization calculation input information. The operation unit 11 outputs optimization calculation input information and an execution instruction to the optimization device 2.

最適化装置２のＧＵＩ部２１は、ユーザ端末１から最適化計算入力情報とともに実行指示を受け取ると、最適化計算入力情報を計算部２２へ伝達する。計算部２２は、前処理として最適化計算入力情報を入力する（ステップＳ２０１）。 When the GUI unit 21 of the optimization device 2 receives an execution instruction from the user terminal 1 together with the optimization calculation input information, the GUI unit 21 transmits the optimization calculation input information to the calculation unit 22. The calculation unit 22 inputs optimization calculation input information as preprocessing (step S201).

ステップＳ２０１の後、計算部２２の選択部２２１は、展開されたノードの中から、シミュレーションすべきノードを選択する（ステップＳ２０２）。なお、初期状態ではノードは１つのみであるので、そのノードが選択対象となる。ノードの選択は、本実施形態では、評価値更新部２２４によって算出された指標値を用いて行う。また、選択部２２１は、ノードを選択する際に、枝刈部２２５が枝刈を実行するか否かを判断するために用いる指標値とは異なる指標値を用いる。本実施形態では、ノードの選択に用いる指標をｍｅａｎ、枝刈に用いる指標をｂｅｓｔとする。つまり、探索木内の各ノードは、ノード選択のための評価値であるｍｅａｎＶａｌｕｅと、枝刈のための評価値であるｂｅｓｔＶａｌｕｅとの両方を持つ。具体的には、評価値更新部２２４が、探索木内の各ノードに対応づけて、ｍｅａｎＶａｌｕｅおよびｂｅｓｔＶａｌｕｅを記憶部２３に格納する。 After step S201, the selection unit 221 of the calculation unit 22 selects a node to be simulated from among the expanded nodes (step S202). Since there is only one node in the initial state, that node is a selection target. In this embodiment, the node is selected using the index value calculated by the evaluation value update unit 224. In addition, when selecting a node, the selection unit 221 uses an index value that is different from the index value used by the pruning unit 225 to determine whether or not to execute pruning. In this embodiment, the index used for node selection is mean, and the index used for pruning is best. That is, each node in the search tree has both a meanValue that is an evaluation value for node selection and a bestValue that is an evaluation value for pruning. Specifically, the evaluation value update unit 224 stores the meanValue and the bestValue in the storage unit 23 in association with each node in the search tree.

拡大部２２２は、選択部２２１で選択されたノードのプレイアウト回数が、事前に定められた条件を満たした時に、探索木を１段下位のノードまで展開する（ステップＳ２０３）。本実施形態では、拡大部２２２は、当該プレイアウト回数が予め定められた回数を越えた時に展開する。なお、初期状態でノードが１つのみである時は、この条件に関わらず展開する。展開した場合には、拡大部２２２は、展開したノードのうちの１つを選択ノードとする。 The enlargement unit 222 expands the search tree to a node one level lower when the number of playouts of the node selected by the selection unit 221 satisfies a predetermined condition (step S203). In the present embodiment, the enlargement unit 222 expands when the number of playouts exceeds a predetermined number. When there is only one node in the initial state, it is expanded regardless of this condition. In the case of expansion, the enlargement unit 222 sets one of the expanded nodes as a selection node.

シミュレーション部２２３は、選択ノードからプレイアウト、つまりランダムシミュレーションを実行し解を１つ探索する（ステップＳ２０４）。なお、１つの選択ノードに対して複数のシミュレーションを実行し複数の解を探索することも可能であるが、ここでは、もっともシンプルな例として１つの選択ノードに対して１つのシミュレーションを実行し、１つの解を探索する方法について説明する。本発明の技術的範囲は、１つの選択ノードに対して１つのシミュレーションを実行する形態に限定されない。従って、１つの選択ノードに対して複数のシミュレーションを実行する形態も本発明の技術的範囲に含まれ得る。 The simulation unit 223 searches for one solution by executing playout, that is, random simulation, from the selected node (step S204). Although it is possible to execute a plurality of simulations for one selected node and search for a plurality of solutions, here, as a simplest example, one simulation is executed for one selected node, A method for searching for one solution will be described. The technical scope of the present invention is not limited to the form of executing one simulation for one selected node. Therefore, a form of executing a plurality of simulations for one selected node can also be included in the technical scope of the present invention.

評価値更新部２２４は、シミュレーション部２２３によって得られた解を用いて、ｂｅｓｔ，ｍｅａｎ，ｋｂｅｓｔなどの解の評価値を更新する。なお、本実施形態では、評価値更新部２２４は、ｍｅａｎ，ｂｅｓｔの解の評価値を更新する（ステップＳ２０５）。ｂｅｓｔの解の評価値は、現在までにその選択ノードから実行されたシミュレーションの結果の中で最適なものである。ｍｅａｎの解の評価値は、現在までにその選択ノードから実行されたシミュレーションの結果を平均したものである。ｋｂｅｓｔの解の評価値は、現在までにその選択ノードから実行されたシミュレーションの結果の中でｋ番目までに良い解を平均したものである。 The evaluation value update unit 224 uses the solution obtained by the simulation unit 223 to update the evaluation value of the solution such as best, mean, kbest. In this embodiment, the evaluation value update unit 224 updates the evaluation value of the solution of mean and best (step S205). The evaluation value of the best solution is the optimum among the results of simulations executed from the selected node so far. The evaluation value of the mean solution is an average of the results of simulations executed from the selected node so far. The evaluation value of the kbest solution is an average of the k-th best solution among the results of simulations executed from the selected node so far.

評価値更新部２２４は、指標値である、ｍｅａｎ，ｂｅｓｔの解の評価値について、以下の正規化の処理を行う。 The evaluation value update unit 224 performs the following normalization process on the evaluation values of the solutions of mean and best, which are index values.

選択ノードとその兄弟ノード（選択ノードと共通の親ノードをもつノード）における指標値をそれぞれｖ_１，ｖ_２，ｖ_３，…ｖ_Ｌとした時、最大値Ｍ、最小値ｍは以下のように表すことが可能である。なお、Ｌは、共通の親ノードをもつ子ノードの総数である。 When the index values in the selected node and its sibling nodes (nodes having a common parent node with the selected node) are v ₁ , v ₂ , v ₃ ,... V _L , the maximum value M and the minimum value m are as follows: Can be expressed as Note that L is the total number of child nodes having a common parent node.

ここで、ｖ_ｉは、予め定められた基準により、最小値ｍに近い場合に良い値であると判断されるとする。また、最大値Ｍに近い場合に悪い値であると判断されるとする。評価値更新部２２４は、良い値である最小値ｍを０、悪い値である最大値Ｍを１とするために、ｖ_ｉを正規化する。具体的には、評価値更新部２２４は、ｖ_ｉを次のように変換する。Ｖａｌｕｅ_ｉは、ｖ_ｉの正規化後の値である。 Here, it is assumed that v _i is determined to be a good value when it is close to the minimum value m according to a predetermined criterion. Further, it is assumed that a bad value is determined when the value is close to the maximum value M. The evaluation value updating unit 224 normalizes v _i so that the minimum value m, which is a good value, is 0, and the maximum value M, which is a bad value, is 1. Specifically, the evaluation value update unit 224 converts v _i as follows. Value _i is a value after normalization of v _i .

なお、Ｖａｌｕｅ_ｉの値が一定の分散におさまるように、ｖ_ｉをさらに正規化してもよい。 Note that v _i may be further normalized so that the value _i falls within a certain variance.

評価値更新部２２４が、ｍｅａｎ，ｂｅｓｔについて上記の処理を実行することにより、ｍｅａｎ，ｂｅｓｔに対応するＶａｌｕｅ_ｉが算出される。以下、ｂｅｓｔ，ｍｅａｎ，ｋｂｅｓｔに対応するＶａｌｕｅ_ｉを、それぞれｂｅｓｔＶａｌｕｅ，ｍｅａｎＶａｌｕｅ，ｋｂｅｓｔＶａｌｕｅ（ｋ）と表現する。なお、“（ｋ）”は、上位ｋ個の解を平均算出の対象とすることを表す。 Value _i corresponding to mean and best is calculated by the evaluation value update unit 224 executing the above processing for mean and best. Hereinafter, Value _i corresponding to best, mean, and kbest are expressed as bestValue, meanValue, and kbestValue (k), respectively. “(K)” represents that the top k solutions are to be averaged.

評価値更新部２２４は、ｍｅａｎＶａｌｕｅ，ｂｅｓｔＶａｌｕｅを、選択ノードとその上位のノードそれぞれで計算し、算出結果をもとにそれぞれのノードの評価値を更新する。 The evaluation value update unit 224 calculates meanValue and bestValue at each of the selected node and its upper nodes, and updates the evaluation value of each node based on the calculation result.

枝刈部２２５は、探索木の大きさ、例えば、探索木全体のノード数や探索木のリーフノードの数が大きく、枝刈が必要であると判断した時に、枝刈を実行する。また、与えられた指定時間内に探索木を解空間木の底に到達させるために、探索する空間を狭める必要があると判断した時に、枝刈を実行する（ステップＳ２０６）。本実施形態では、枝刈部２２５は、ステップＳ２０５において評価値更新部２２４が更新したｂｅｓｔＶａｌｕｅで各ノードを評価する。枝刈部２２５は、評価値の悪いノードは探索木から枝刈する。その後、そのノードやその下位のノードからシミュレーションされることはない。 The pruning unit 225 performs pruning when it determines that pruning is necessary because the size of the search tree, for example, the number of nodes of the entire search tree or the number of leaf nodes of the search tree is large. Further, when it is determined that the space to be searched needs to be narrowed so that the search tree reaches the bottom of the solution space tree within the given designated time, pruning is executed (step S206). In the present embodiment, the pruning unit 225 evaluates each node with the bestValue updated by the evaluation value update unit 224 in step S205. The pruning unit 225 prunes nodes with poor evaluation values from the search tree. Thereafter, no simulation is performed from that node or its lower nodes.

収縮部２２６は、枝刈部２２５による枝刈などにより、探索木内のノードの中で分岐を持たないノードが発生した時に、そのノードを取り除く（ステップＳ２０７）。そして、収縮部２２６は、取り除いたノードの親ノードと子ノードとを直接接続する。その際、収縮部２２６は、当該親ノードから当該子ノードへ、ノード選択のための評価関数に用いられる統計値をコピーする。具体的には、収縮部２２６は、記憶部２３において当該親ノードに対応づけて格納されている統計値を、当該子ノードに対応づけて格納し直す。枝刈のための評価関数に用いられる統計値はコピーされず、当該子ノードのものがそのまま用いられる。 The contraction unit 226 removes a node having no branch among the nodes in the search tree by pruning by the pruning unit 225 or the like (step S207). Then, the contraction unit 226 directly connects the parent node and the child node of the removed node. At that time, the contraction unit 226 copies the statistical value used for the evaluation function for node selection from the parent node to the child node. Specifically, the contraction unit 226 stores the statistical value stored in the storage unit 23 in association with the parent node in association with the child node. The statistical value used for the evaluation function for pruning is not copied, but the value of the child node is used as it is.

図３は、第１の実施形態において探索木が収縮される様子を示す説明図である。図３（ａ）に示す探索木は、ノードＡを根ノードとする探索木である。ノードＡ以下には、ノードＢ〜Ｋが存在する。また、ノードＧには３つの展開候補となるノード（ノードＬ，Ｍ，Ｎ）が存在する。「×」印は、ノードＦ、ノードＪ、ノードＫが枝刈の実行対象であることを表す。 FIG. 3 is an explanatory diagram illustrating how the search tree is contracted in the first embodiment. The search tree shown in FIG. 3A is a search tree having the node A as a root node. Below node A, nodes B to K exist. In addition, node G has three expansion candidate nodes (nodes L, M, and N). The “x” mark indicates that the node F, the node J, and the node K are pruning execution targets.

図３（ｂ）は、図３（ａ）に示す探索木に対して、枝刈を実行した後の様子を表す。図３（ｃ）は、図３（ｂ）に示す探索木に対して、収縮処理を行った後の様子を表す。図３（ｃ）では、枝刈によって分岐を持たなくなったノードＢ，Ｄが取り除かれている。また、ノードＡとノードＥとが直接接続されている。図３（ｄ）は、図３（ｃ）に示す探索木に対して、ノード展開を実行した後の様子を表す。 FIG. 3B shows a state after pruning is performed on the search tree shown in FIG. FIG. 3C illustrates a state after the contraction process is performed on the search tree illustrated in FIG. In FIG. 3C, nodes B and D that have no branches due to pruning are removed. Node A and node E are directly connected. FIG. 3D shows a state after node expansion is performed on the search tree shown in FIG.

計算部２２は、計算部２２における計算時間が事前に定められた上限に達するまで、ステップＳ２０２〜Ｓ２０７の処理（選択処理、ノード展開処理、シミュレーション実行処理、評価値更新処理、枝刈処理および収縮処理）を繰り返し実行する。つまり、当該計算時間が上限に達していない場合は（ステップＳ２０８のＹｅｓ）、計算部２２はステップＳ２０２の処理に戻る。当該計算時間が上限に達した場合は（ステップＳ２０８のＮｏ）、計算部２２は、計算を終了し、最適化計算結果、つまり探索して得た解を示す解情報をＧＵＩ部２１に渡す（ステップＳ２０９）。なお、計算部２２は、計算時間ではなく、要件として与えられた解の値が算出されるまでステップＳ２０２〜Ｓ２０７の処理を繰り返し実行するようにしてもよい。 The calculation unit 22 performs the processing in steps S202 to S207 (selection processing, node expansion processing, simulation execution processing, evaluation value update processing, pruning processing, and contraction until the calculation time in the calculation unit 22 reaches a predetermined upper limit. Process) is repeated. That is, when the calculation time has not reached the upper limit (Yes in step S208), the calculation unit 22 returns to the process in step S202. When the calculation time reaches the upper limit (No in step S208), the calculation unit 22 ends the calculation, and passes the optimization calculation result, that is, solution information indicating the solution obtained by the search, to the GUI unit 21 ( Step S209). Note that the calculation unit 22 may repeatedly execute the processes of steps S202 to S207 until the solution value given as a requirement is calculated instead of the calculation time.

また、ステップＳ２０２〜Ｓ２０７の計算処理において、計算部２２は、各計算途中で得たノードの探索回数や評価値を含む情報を記憶部２３に格納する。また、計算部２２は、探索して得た解を含む情報を記憶部２３に格納する。計算部２２は、記憶部２３に格納された情報を取得することにより、計算途中における各ノードの探索回数や評価値を認識することができる。 In the calculation process of steps S202 to S207, the calculation unit 22 stores information including the number of node searches and evaluation values obtained during each calculation in the storage unit 23. Further, the calculation unit 22 stores information including the solution obtained by searching in the storage unit 23. The calculation unit 22 can recognize the number of searches for each node and the evaluation value during the calculation by acquiring information stored in the storage unit 23.

なお、本実施形態では、問題データがユーザ端末１から最適化計算入力情報として計算部２２に入力される場合を例にしたが、計算部２２は記憶部２３に格納された問題データを取得するようにしてもよい。そのような形態を実現するには、ユーザ等が予め問題データを記憶部２３に格納すればよい。 In this embodiment, the case where problem data is input as optimization calculation input information from the user terminal 1 to the calculation unit 22 is taken as an example. However, the calculation unit 22 acquires the problem data stored in the storage unit 23. You may do it. In order to realize such a form, a user or the like may store problem data in the storage unit 23 in advance.

また、ステップＳ２０７の収縮処理を実行するタイミングは、ステップＳ２０６の枝刈処理の後に限られない。また、収縮部は、枝刈部に含まれていてもよい。また、計算部に枝刈部が複数含まれている場合には、各枝刈部が収縮部を含んでいてもよい。図４は、枝刈部を２つ含む最適化装置を備えた最適化システムの構成の一例を示すブロック図である。図４に示す例では、計算部２２が、探索木の大きさを抑えることを目的とする枝刈処理を行う第１の枝刈部２２５１と、指定された計算時間内に探索木を解空間木の底に到達させることを目的とする枝刈処理を行う第２の枝刈部２２５２とを含む。そして、第１の枝刈部２２５１、第２の枝刈部２２５２のそれぞれが収縮部２２６を含む。つまり、図４に示す計算部２２は、１つの選択ノードに対する計算処理の中で、２回収縮処理を実行する。 Further, the timing for executing the contraction process in step S207 is not limited to after the pruning process in step S206. The contraction part may be included in the pruning part. When the calculation unit includes a plurality of pruning units, each pruning unit may include a contraction unit. FIG. 4 is a block diagram illustrating an example of a configuration of an optimization system including an optimization device including two pruning units. In the example illustrated in FIG. 4, the calculation unit 22 solves the search tree in the first calculation unit 2251 that performs a pruning process for the purpose of suppressing the size of the search tree and the specified calculation time. A second pruning unit 2252 that performs a pruning process for the purpose of reaching the bottom of the tree. Each of the first pruning unit 2251 and the second pruning unit 2252 includes a contraction unit 226. That is, the calculation unit 22 illustrated in FIG. 4 executes the contraction process twice in the calculation process for one selected node.

以上に説明したように、本実施形態では、枝刈（Ｐｒｕｎｉｎｇ）を実行した際に、探索木の収縮（Ｓｈｒｉｎｋａｇｅ）を行うことにより、枝刈により発生した、分岐を持たない無駄なノードを取り除くことができる。従って、探索木内のノード数をさらに削減することができる。また、本実施形態では、取り除いたノードの親ノードと子ノードとを直接接続する。それにより、探索木の根ノードからリーフノードまで辿る時間を削減することができ、解探索における計算時間を短縮することができる。従って、本実施形態によれば、シミュレーションを用いた解探索において、メモリ使用量を低減しつつ、指定された計算時間以内に解を算出することができる。 As described above, according to the present embodiment, when pruning is executed, a search tree is shrunk to remove a useless node having no branch generated by pruning. be able to. Therefore, the number of nodes in the search tree can be further reduced. In the present embodiment, the parent node and the child node of the removed node are directly connected. Thereby, it is possible to reduce the time taken from the root node of the search tree to the leaf node, and to reduce the calculation time in the solution search. Therefore, according to the present embodiment, in the solution search using simulation, the solution can be calculated within the designated calculation time while reducing the memory usage.

また、本実施形態では、ノード選択のための評価関数に用いる統計値と、枝刈のための評価関数に用いる統計値とを、各ノードに別々に保持させる。また、分岐を持たないノードを取り除く際に、当該ノードの親ノードが保持する、ノード選択のための評価関数に用いられる統計値を、当該ノードの子ノードへコピーする。それにより、単にノードを取り除くシステムに比べて、ノード選択のための評価関数値を適切に維持することができるので、解の精度を劣化させにくい。 In the present embodiment, the statistical value used for the evaluation function for node selection and the statistical value used for the evaluation function for pruning are separately held in each node. Further, when a node having no branch is removed, a statistical value used for an evaluation function for node selection held by the parent node of the node is copied to a child node of the node. As a result, the evaluation function value for node selection can be appropriately maintained as compared with a system that simply removes a node, so that the accuracy of the solution is unlikely to deteriorate.

また、本実施形態では、選択部２２１がノードを選択する際に用いる指標値と、枝刈部２２５が枝刈を実行する際に用いる指標値とが異なる場合について説明したが、選択部２２１がノードを選択する際に用いる指標値と、枝刈部２２５が枝刈を実行する際に用いる指標値とは同じであってもよい。すなわち、ノードの選択に用いる指標と、枝刈に用いる指標とは同じであってもよい。 In the present embodiment, the case has been described in which the index value used when the selection unit 221 selects a node and the index value used when the pruning unit 225 performs pruning are different. The index value used when selecting a node and the index value used when the pruning unit 225 executes pruning may be the same. That is, the index used for node selection and the index used for pruning may be the same.

実施形態２．
以下、本発明の第２の実施形態を図面を参照して説明する。 Embodiment 2. FIG.
Hereinafter, a second embodiment of the present invention will be described with reference to the drawings.

第２の実施形態における最適化システムの構成は、第１の実施形態の構成と同様である。 The configuration of the optimization system in the second embodiment is the same as the configuration of the first embodiment.

ここでは、第１の実施形態と同様に、最適化システムがスケジューリング問題に適用される場合を例にする。 Here, as in the first embodiment, a case where the optimization system is applied to a scheduling problem is taken as an example.

第２の実施形態における計算部２２の動作は、図２に示す第１の実施形態の動作と同様である。 The operation of the calculation unit 22 in the second embodiment is the same as the operation of the first embodiment shown in FIG.

しかし、ステップＳ２０２における選択部２２１の動作、ステップＳ２０５における評価値更新部２２４の動作およびステップＳ２０７における収縮部２２６の動作が異なる。ここで、ステップＳ２０２、Ｓ２０５およびＳ２０６における動作を説明する。 However, the operation of the selection unit 221 in step S202, the operation of the evaluation value update unit 224 in step S205, and the operation of the contraction unit 226 in step S207 are different. Here, operations in steps S202, S205, and S206 will be described.

ステップＳ２０２において、選択部２２１は、ノードを選択する際に、枝（エッジ）が保持する指標値を用いる。エッジが保持する指標値は、具体的には、評価値更新部２２４が各エッジに対応付けて記憶部２３に格納した指標値である。 In step S202, the selection unit 221 uses an index value held by a branch (edge) when selecting a node. Specifically, the index value held by the edge is an index value stored in the storage unit 23 by the evaluation value update unit 224 in association with each edge.

ステップＳ２０５において、評価値更新部２２４は、正規化の処理を行う際に、以下のように、選択ノードとその兄弟ノードにおける指標値だけでなく、それらのノードのエッジにおける指標値についても正規化する。 In step S205, when the normalization process is performed, the evaluation value update unit 224 normalizes not only the index values at the selected node and its sibling nodes but also the index values at the edges of those nodes as follows. To do.

評価値更新部２２４は、選択ノードとその兄弟ノードにおける指標値ｍｅａｎＶａｌｕｅを、式１〜式３を用いて算出する。そして、評価値更新部２２４は、算出結果をもとに、選択ノードとその上位のノードそれぞれの評価値を更新する。 The evaluation value update unit 224 calculates the index value meanValue at the selected node and its sibling node using Equations 1 to 3. Then, the evaluation value update unit 224 updates the evaluation value of each of the selected node and its higher nodes based on the calculation result.

また、評価値更新部２２４は、選択ノードとその兄弟ノードに接続された各エッジについても同様に、式１〜式３を用いてｍｅａｎＶａｌｕｅを算出する。そして、評価値更新部２２４は、算出結果をもとに、選択ノードとその上位のノードとを結ぶルートにあるエッジそれぞれの評価値を更新する。例えば、図３（ｄ）に示す探索木において、ノードＬが選択ノードであった場合には、ノードＬ，Ｇを結ぶエッジ、ノードＧ，Ｃを結ぶエッジ、ノードＣ，Ａを結ぶエッジのそれぞれの評価値が更新される。 Similarly, the evaluation value update unit 224 calculates meanValue for each edge connected to the selected node and its sibling nodes using Equations 1 to 3. Then, the evaluation value update unit 224 updates the evaluation value of each edge on the route connecting the selected node and the higher-order node based on the calculation result. For example, in the search tree shown in FIG. 3D, when the node L is the selected node, each of the edge connecting the nodes L and G, the edge connecting the nodes G and C, and the edge connecting the nodes C and A, respectively. The evaluation value of is updated.

ステップＳ２０６において、収縮部２２６は、分岐を持たないノードを取り除く時に、そのノードの親ノードと子ノードを直接接続する。その際、収縮部２２６は、当該ノード、および当該ノードと子ノードとを結ぶエッジを取り除く。また、収縮部２２６は、当該ノードと親ノードとを結ぶエッジを、当該ノードの子ノードに接続する。 In step S206, when the contraction unit 226 removes a node having no branch, the contraction unit 226 directly connects the parent node and the child node of the node. At that time, the contraction unit 226 removes the node and an edge connecting the node and the child node. The contraction unit 226 connects an edge connecting the node and the parent node to a child node of the node.

図５は、第２の実施形態において探索木が収縮される様子を示す説明図である。図５（ｃ）に示すように、本実施形態では、ノードＡと、枝刈によって分岐を持たなくなったノードＢとを結ぶエッジが、ノードＡと、ノードＢの子ノード（ノードＥ）とを結ぶエッジとなる。 FIG. 5 is an explanatory diagram showing how the search tree is contracted in the second embodiment. As shown in FIG. 5 (c), in this embodiment, an edge connecting node A and node B that no longer has a branch due to pruning represents node A and a child node (node E) of node B. It becomes an edge to connect.

以上に説明したように、本実施形態では、第１の実施形態と同様に、枝刈により発生した、分岐を持たない無駄なノードを取り除くことができる。従って、第１の実施形態と同様の効果を得ることができる。 As described above, in this embodiment, as in the first embodiment, it is possible to remove a useless node that does not have a branch and is generated by pruning. Therefore, the same effect as the first embodiment can be obtained.

なお、各実施形態において、最適化装置がスケジューリング問題に適用される場合を例にしたが、本発明の適用範囲はその限りではない。本発明は、タスクを担当者に割り当てるスケジューリング問題などの組合せ最適化問題を中心に、最適化問題全般に適用することが可能である。また、最適化問題以外の解探索にも適用することが可能である。 In each embodiment, the case where the optimization apparatus is applied to the scheduling problem is taken as an example, but the scope of application of the present invention is not limited thereto. The present invention can be applied to optimization problems in general, focusing on combinatorial optimization problems such as scheduling problems for assigning tasks to persons in charge. It can also be applied to solution searches other than optimization problems.

図６は、本発明による解探索装置の最小構成を示すブロック図である。図７は、本発明による解探索装置の他の最小構成を示すブロック図である。 FIG. 6 is a block diagram showing the minimum configuration of the solution search apparatus according to the present invention. FIG. 7 is a block diagram showing another minimum configuration of the solution search apparatus according to the present invention.

図６に示すように、解探索装置（図１に示す最適化装置２に相当。）は、シミュレーションを用いた解探索において、探索木中に子ノードを複数持たないノードが存在する場合には、当該ノードを探索木から取り除き、取り除かれたノードが子ノードを持つ場合には、当該子ノードを当該ノードの親ノードに接続する収縮処理を実行する収縮部１０１（図１に示す最適化装置２における収縮部２２６に相当。）を含む。 As shown in FIG. 6, the solution search device (corresponding to the optimization device 2 shown in FIG. 1), in the solution search using simulation, when there are nodes that do not have a plurality of child nodes in the search tree. When the node is removed from the search tree and the removed node has a child node, the contraction unit 101 (the optimization apparatus shown in FIG. 1) executes a contraction process for connecting the child node to the parent node of the node. 2 corresponds to the contraction portion 226 in FIG.

そのような構成によれば、探索木内のノード数をさらに削減することができる。よって、探索木の根ノードからリーフノードまで辿る時間、つまり解探索における計算時間を削減することができる。従って、シミュレーションを用いた解探索において、メモリ使用量を低減しつつ、指定された計算時間以内に解を算出することができる。 According to such a configuration, the number of nodes in the search tree can be further reduced. Therefore, it is possible to reduce the time for tracing from the root node of the search tree to the leaf node, that is, the calculation time for solution search. Therefore, in the solution search using simulation, the solution can be calculated within the designated calculation time while reducing the memory usage.

上記の実施形態には、以下のような解探索装置も開示されている。 In the above embodiment, the following solution search apparatus is also disclosed.

（１）収縮部１０１は、シミュレーションを用いたモンテカルロ木探索において収縮処理を実行する解探索装置。 (1) The contraction unit 101 is a solution search apparatus that executes contraction processing in a Monte Carlo tree search using simulation.

そのような構成によれば、モンテカルロ木探索において、探索木内のノード数をさらに削減することができる。 According to such a configuration, the number of nodes in the search tree can be further reduced in the Monte Carlo tree search.

（２）図７に示すように、探索木中の選択肢となるノードの中からシミュレーションの実行対象となるノードを選択し、選択されたノードからシミュレーションを実行する実行部１０２（図１に示す最適化装置２における選択部２２１、拡大部２２２およびシミュレーション部２２３に相当。）と、シミュレーション結果をもとに、評価関数を用いて評価値を算出し、当該評価値をもとに、選択されたノードおよびその上位ノードの評価値を更新する更新部１０３（図１に示す最適化装置２における評価値更新部２２４に相当。）と、予め定められた基準を満たさない評価値を持つノードを探索木から取り除く枝刈部１０４（図１に示す最適化装置２における枝刈部２２５に相当。）とを含み、収縮部１０１は、枝刈部１０４が探索木からノードを取り除く処理を実行した後に、収縮処理を実行する解探索装置。 (2) As shown in FIG. 7, an execution unit 102 that selects a node to be simulated from among the nodes that are options in the search tree and executes the simulation from the selected node (optimum shown in FIG. 1) Equivalent to the selection unit 221, the enlargement unit 222, and the simulation unit 223 in the conversion apparatus 2), and an evaluation value is calculated based on the simulation result, and the evaluation value is selected based on the evaluation value Update unit 103 (equivalent to evaluation value update unit 224 in optimization apparatus 2 shown in FIG. 1) that updates the evaluation value of the node and its upper node, and a node having an evaluation value that does not satisfy a predetermined criterion A pruning unit 104 to be removed from the tree (corresponding to the pruning unit 225 in the optimization apparatus 2 shown in FIG. 1). After executing the processing of removing de, solution search apparatus for performing contraction processing.

そのような構成によれば、枝刈（Ｐｒｕｎｉｎｇ）を実行した後に、探索木の収縮（Ｓｈｒｉｎｋａｇｅ）を行うことにより、枝刈により発生した、分岐を持たない無駄なノードをなくすことができる。 According to such a configuration, it is possible to eliminate a useless node having no branch generated by pruning by performing shrinkage after performing pruning.

（３）更新部１０３は、ノード選択用の評価関数を用いて、実行部１０２がノードを選択するときに用いるノード選択用の評価値を算出し、枝刈用の評価関数を用いて、枝刈部１０４がノードを探索木から取り除くときに用いる枝刈用の評価値を算出する解探索装置。 (3) The updating unit 103 calculates an evaluation value for node selection used when the execution unit 102 selects a node by using the evaluation function for node selection, and uses the evaluation function for pruning to A solution search device that calculates an evaluation value for pruning that is used when the pruning unit 104 removes a node from the search tree.

そのような構成によれば、各ノードにノード選択のための評価関数に用いる統計値と、枝刈のための評価関数に用いる統計値とを別々に保持させることができる。 According to such a configuration, each node can separately hold a statistical value used for an evaluation function for node selection and a statistical value used for an evaluation function for pruning.

（４）更新部１０３は、収縮部１０１が探索木から取り除いたノードが子ノードを持つ場合には、当該ノードの親ノードのノード選択用の評価関数を用いて算出した評価値をもとに、当該子ノードのノード選択用の評価値を更新する解探索装置。 (4) When the node removed from the search tree by the contraction unit 101 has a child node, the update unit 103 uses the evaluation value calculated using the node selection evaluation function of the parent node of the node. A solution search apparatus for updating an evaluation value for node selection of the child node.

そのような構成によれば、単にノードを取り除く、収縮処理を行わないシステムに比べて、ノード選択のための評価関数値を適切に維持することができるので、解の精度を劣化させにくい。 According to such a configuration, the evaluation function value for node selection can be appropriately maintained as compared with a system that simply removes a node and does not perform contraction processing, so that the accuracy of the solution is unlikely to deteriorate.

（５）更新部１０３は、収縮部１０１が探索木から取り除いたノードが子ノードを持つ場合には、当該子ノードの枝刈用の評価関数を用いて算出した評価値をもとに、当該子ノードの枝刈用の評価値を更新する解探索装置。 (5) When the node removed from the search tree by the contraction unit 101 has a child node, the update unit 103 uses the evaluation value calculated by using the evaluation function for pruning the child node. A solution search device that updates an evaluation value for pruning a child node.

そのような構成によれば、単にノードを取り除く、収縮処理を行わないシステムに比べて、枝刈のための評価関数値を適切に維持することができるので、解の精度を劣化させにくい。 According to such a configuration, the evaluation function value for pruning can be appropriately maintained as compared with a system that simply removes a node and does not perform contraction processing, so that the accuracy of the solution is unlikely to deteriorate.

１ユーザ端末
２最適化装置
１１操作部
１２表示部
２１ＧＵＩ部
２２計算部
２３記憶部
１０１、２２６収縮部
１０２実行部
１０３更新部
１０４、２２５枝刈部
２２１選択部
２２２拡大部
２２３シミュレーション部
２２４評価値更新部
２２５枝刈部
２２５１第１の枝刈部
２２５２第２の枝刈部
２２６収縮部 DESCRIPTION OF SYMBOLS 1 User terminal 2 Optimization apparatus 11 Operation part 12 Display part 21 GUI part 22 Calculation part 23 Storage part 101,226 Shrinkage part 102 Execution part 103 Update part 104,225 Pruning part 221 Selection part 222 Expansion part 223 Simulation part 224 Evaluation Value updating unit 225 Pruning unit 2251 First pruning unit 2252 Second pruning unit 226 Contracting unit

Claims

In solution search using simulation, if a node that does not have multiple child nodes exists in the search tree, the node is removed from the search tree, and if the removed node has child nodes, the child node A solution search apparatus comprising: a contraction unit that executes contraction processing for connecting a node to a parent node of the node.

The solution search apparatus according to claim 1, wherein the contraction unit performs contraction processing in a Monte Carlo tree search using simulation.

An execution unit that selects a node to be simulated from among the nodes that are options in the search tree, and executes the simulation from the selected node;
Based on the simulation result, an evaluation value is calculated using an evaluation function, and based on the evaluation value, an update unit that updates the evaluation value of the selected node and its upper node,
A pruning unit that removes a node having an evaluation value that does not satisfy a predetermined criterion from the search tree,
The solution search apparatus according to claim 1, wherein the contraction unit performs contraction processing after the pruning unit executes processing for removing a node from the search tree.

The update unit calculates an evaluation value for node selection used when the execution unit selects a node using the evaluation function for node selection, and uses the evaluation function for pruning, and the pruning unit selects the node The solution search apparatus according to claim 3, wherein an evaluation value for pruning used when removing from the search tree is calculated.

When the node removed from the search tree by the contraction unit has a child node, the update unit, based on the evaluation value calculated using the node selection evaluation function of the parent node of the node, The solution search apparatus according to claim 4, wherein the evaluation value for node selection is updated.

When the node removed from the search tree by the contraction unit has a child node, the updating unit prunes the child node based on the evaluation value calculated using the evaluation function for pruning the child node. The solution search device according to claim 4, wherein the evaluation value for use is updated.

In solution search using simulation, if a node that does not have multiple child nodes exists in the search tree, the node is removed from the search tree, and if the removed node has child nodes, the child node A solution search method characterized by connecting to the parent node of the node.

On the computer,
In solution search using simulation, if a node that does not have multiple child nodes exists in the search tree, the node is removed from the search tree, and if the removed node has child nodes, the child node A solution search program for executing the process of connecting to the parent node of the node.