JP7376405B2

JP7376405B2 - Optimization processing device, optimization processing method, and optimization processing program

Info

Publication number: JP7376405B2
Application number: JP2020056611A
Authority: JP
Inventors: 文鵬魏; 有哉岡留; 敏子相薗
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2023-11-08
Anticipated expiration: 2040-03-26
Also published as: JP2021157456A

Description

本発明は、組合せ最適化の問題に対して最適解を探索する最適化処理装置、最適化処理方法及び最適化処理プログラムに関する。 The present invention relates to an optimization processing device, an optimization processing method, and an optimization processing program that search for an optimal solution to a combinatorial optimization problem.

従来、生産計画や配送計画などの組合せ最適化の問題に対し、数理最適化の手法が適用されている。例えば特開２０１７－１２０５６１号公報（特許文献１）には、「搬送物置場における搬送物の受け入れから払い出しまでの物流計画を立案する物流計画立案装置であって、前記搬送物置場における置場、前記搬送物置場において搬送物を取り扱う搬送機器、搬送物、及び物流計画立案の方針をモデル化して得られる、数式モデルである物流モデルを構築する物流モデル構築手段と、前記物流モデル構築手段で構築した物流モデルに基づいて、数理最適化手法により物流計画を立案する物流計画立案手段と、前記物流計画立案手段で立案した物流計画を出力する出力手段とを備えたことを特徴とする」という記載がある。 Conventionally, mathematical optimization techniques have been applied to combinatorial optimization problems such as production planning and delivery planning. For example, Japanese Patent Application Laid-open No. 2017-120561 (Patent Document 1) describes a “logistics planning device for formulating a logistics plan from receiving to delivery of goods in a storage area, A logistics model construction means for constructing a logistics model that is a mathematical model obtained by modeling transport equipment that handles transported goods in a transported goods storage area, transported goods, and logistics planning policy, and a logistics model constructed by the logistics model construction means. The present invention is characterized by comprising: a logistics planning means for drawing up a logistics plan using a mathematical optimization method based on a logistics model; and an output means for outputting the logistics plan drawn up by the logistics planning means. be.

特開２０１７－１２０５６１号公報Japanese Patent Application Publication No. 2017-120561

上記特許文献１に代表される従来技術では、モデリングを人手で行うことでコストが増大するという問題がある。また、対象とする問題を最適化する上での制約を業務知識として、あるいは暗黙知として保有していることが前提となる。そのため、実績にない未知の問題に適用することができない。 The conventional technology represented by Patent Document 1 has a problem in that modeling is performed manually, which increases costs. It is also assumed that the constraints on optimizing the target problem are possessed as business knowledge or tacit knowledge. Therefore, it cannot be applied to unknown problems that do not have a proven track record.

そこで、本発明では、組合せ最適化の問題に対して最適解を探索するにあたり、モデルを自律的に構築し、未知の問題に適用可能な技術を適用することを目的とする。 Therefore, in searching for an optimal solution to a combinatorial optimization problem, the present invention aims to autonomously construct a model and apply techniques applicable to unknown problems.

上記目的を達成するために、代表的な本発明の最適化処理装置、最適化処理方法及び最適化処理プログラムの一つは、学習データから組合せの評価値を予測する予測モデルを生成し、予測モデルを用いて最適解の探索を行う場合に学習データの分布に基づいて探索範囲を制限する。 In order to achieve the above object, one of the typical optimization processing devices, optimization processing methods, and optimization processing programs of the present invention generates a prediction model that predicts the evaluation value of a combination from learning data, and When searching for an optimal solution using a model, the search range is limited based on the distribution of learning data.

本発明によれば、組合せ最適化の問題に対して最適解を探索するにあたり、モデルを自律的に構築し、未知の問題に適用することができる。
上記した以外の課題、構成及び効果は以下の実施の形態の説明により明らかにされる。 According to the present invention, when searching for an optimal solution to a combinatorial optimization problem, a model can be autonomously constructed and applied to an unknown problem.
Problems, configurations, and effects other than those described above will be made clear by the following description of the embodiments.

実施例に係る最適化処理装置による処理の概念図。FIG. 3 is a conceptual diagram of processing by the optimization processing device according to the embodiment. 予測モデルと学習データの分布についての説明図。An explanatory diagram of the prediction model and the distribution of training data. 組合せ最適化問題の具体例についての説明図。An explanatory diagram of a specific example of a combinatorial optimization problem. 最適化処理装置の構成図。FIG. 2 is a configuration diagram of an optimization processing device. 最適化処理装置の処理手順を示すフローチャート。5 is a flowchart showing the processing procedure of the optimization processing device. 最適化処理装置の処理手順の説明図。FIG. 3 is an explanatory diagram of the processing procedure of the optimization processing device. 学習結果表示画面の具体例。A specific example of the learning result display screen. 最適解探索画面の具体例。A specific example of the optimal solution search screen. 最適化処理装置の変形例についての説明図。FIG. 7 is an explanatory diagram of a modification of the optimization processing device.

以下、実施例を図面を用いて説明する。 Examples will be described below with reference to the drawings.

図１は、実施例に係る最適化処理装置による処理の概念図である。本実施例に係る最適化処理装置は、まず、学習データから予測モデルを生成する。学習データは、過去の問題と解のセットである。具体例については後述するが、例えば、複数の工程における作業の順序が問題であれば、実際に行われた作業の順序（組合せ）が解となる。予測モデルの生成には、例えば深層学習を用いる。予測モデルは、解の候補となる組合せから予測評価値を求める関数である。予測評価値は、組合せを相対的に評価する指標であり、組合せを比較してより適切な解を探索するために用いられる。 FIG. 1 is a conceptual diagram of processing by an optimization processing device according to an embodiment. The optimization processing device according to this embodiment first generates a prediction model from learning data. The learning data is a set of past problems and solutions. A specific example will be described later, but for example, if the problem is the order of operations in a plurality of steps, the solution will be the order (combination) of operations actually performed. For example, deep learning is used to generate the predictive model. A prediction model is a function that calculates a predicted evaluation value from combinations of solution candidates. The predicted evaluation value is an index for relatively evaluating combinations, and is used to compare combinations and search for a more appropriate solution.

最適化処理装置は、問題が与えられたならば、生成した予測モデルを用いて最適化処理を行い、最適解を出力する。具体的には、与えられた問題の解の候補としての組合せについて予測評価値を求め、より予測評価値の高い組合せを探索することで最適解を求める。 When given a problem, the optimization processing device performs optimization processing using the generated prediction model and outputs an optimal solution. Specifically, predicted evaluation values are obtained for combinations as candidate solutions to a given problem, and an optimal solution is obtained by searching for a combination with a higher predicted evaluation value.

ここで、本実施例に係る最適化処理装置は、学習データの分布に基づいて最適解の探索範囲を制限する。学習データから生成された予測モデルは、学習データの分布内では高い精度を期待できるが、学習データの分布から外れると精度が低下するためである。 Here, the optimization processing device according to this embodiment limits the search range for the optimal solution based on the distribution of learning data. This is because a predictive model generated from learning data can be expected to have high accuracy within the distribution of the learning data, but its accuracy decreases when it deviates from the distribution of the learning data.

図２は、予測モデルと学習データの分布についての説明図である。図２では、ｚ軸を予測評価値として予測モデルを表示している。したがって、最適化処理では、ｚの値が最大となる組合せ（図２ではｘとｙの組合せ）を最適解として探索することになる。 FIG. 2 is an explanatory diagram of the prediction model and the distribution of learning data. In FIG. 2, the prediction model is displayed with the z-axis as the prediction evaluation value. Therefore, in the optimization process, the combination that maximizes the value of z (the combination of x and y in FIG. 2) is searched as the optimal solution.

この探索は、学習データの分布範囲とその近傍に限定して行う。図２では、予測モデルの曲面上において、学習データの分布範囲に対応する範囲を白線で囲むことで示している。最適化処理は、この白線で囲まれた範囲とその近傍で、ｚの値を比較して最大値を探索する処理と言える。 This search is limited to the distribution range of learning data and its vicinity. In FIG. 2, on the curved surface of the prediction model, a range corresponding to the distribution range of the learning data is indicated by surrounding it with a white line. The optimization process can be said to be a process of searching for the maximum value by comparing the values of z in the range surrounded by this white line and its vicinity.

ここで、探索範囲に学習データの分布範囲の近傍を含める理由について説明する。具体的には、探索範囲は、学習データの分布範囲よりも大きく、学習データの分布範囲を包含する。本来、十分な数と分布を有する学習データが入手可能であれば、広範囲で精度の高い予測モデルを生成することができる。しかし、解となりえる組合せが膨大であれば、十分な数と分布を有する学習データの入手は現実的ではなくなる。換言するならば、解空間が広大であれば、学習データは疎で偏りのある群とならざるを得ない。 Here, the reason for including the vicinity of the distribution range of learning data in the search range will be explained. Specifically, the search range is larger than the distribution range of the learning data and includes the distribution range of the learning data. Essentially, if training data with a sufficient number and distribution is available, it is possible to generate highly accurate predictive models over a wide range. However, if the number of possible solution combinations is enormous, it becomes impractical to obtain training data with a sufficient number and distribution. In other words, if the solution space is vast, the training data must be a sparse and biased group.

学習データの分布範囲の中で探索を行うことは、すでに説明したように、精度の高い予測評価値を得るために有効である。一方で、探索範囲を学習データの分布範囲よりも大きくすることは、学習データの分布範囲を逸脱した最適解が得られる可能性があることを示す。学習データの分布範囲を逸脱した最適解に従って作業を実施し、その作業実績を新たな学習データとして利用すれば、学習データの分布範囲を拡大することができる。すなわち、探索範囲を学習データの分布範囲よりも大きくすることで、学習データの分布範囲を拡大して予測モデルを成熟させることができるのである。 As already explained, searching within the distribution range of the learning data is effective for obtaining highly accurate predicted evaluation values. On the other hand, making the search range larger than the distribution range of the learning data indicates that there is a possibility of obtaining an optimal solution that deviates from the distribution range of the learning data. By performing work according to an optimal solution that deviates from the distribution range of learning data and using the work results as new learning data, it is possible to expand the distribution range of learning data. That is, by making the search range larger than the distribution range of learning data, it is possible to expand the distribution range of learning data and mature the prediction model.

図３は、組合せ最適化問題の具体例についての説明図である。図３では、倉庫内において複数工程を対象とする作業順序の最適化問題を示している。具体的には注文された商品を倉庫から取り出して、商品ラベルを付け、仕分けし、梱包し、出荷する作業を対象としている。 FIG. 3 is an explanatory diagram of a specific example of a combinatorial optimization problem. FIG. 3 shows a work order optimization problem that targets multiple processes in a warehouse. Specifically, it targets the work of taking ordered products out of the warehouse, labeling them, sorting them, packaging them, and shipping them.

例えば、注文の数であるオーダー数が１日に１０００であり、作業員の数が５００であり、８時間の稼働時間を１時間ごとに時間帯として分割するならば、作業順序の組合せ数は、「１０００！×５００！×８！」となる。このように作業順序の組合せは膨大であり、さらに、工程ごとに最適な作業順序の条件が異なる。具体的には、商品ラベル付けであればラベルの種類が同じ商品を連続して作業すること、自動搬送棚による仕分けであれば同時に注文されやすい商品を連続すること、梱包では梱包方法が同じオーダーを連続すること、出荷では出荷先が近いオーダーを連続することが条件となる。 For example, if the number of orders (the number of orders) is 1000 per day, the number of workers is 500, and the 8-hour working time is divided into hourly time periods, the number of combinations of work orders is , becomes "1000!×500!×8!". As described above, there are a huge number of combinations of work orders, and the optimal work order conditions differ for each process. Specifically, in the case of product labeling, products with the same type of label should be processed in succession, in the case of sorting using automatic transport shelves, products that are likely to be ordered at the same time should be processed in succession, and in the case of packaging, orders with the same packaging method should be processed The conditions for shipping are that orders with close shipping destinations must be consecutive.

このような作業の最適化を新たに検討するとすれば、過去の情報、業務知識、暗黙知を用いてモデリングを人手で行うのは困難である。また、過去の情報を蓄積したとしても、組合せが膨大であるために同じ作業があることは期待できない。 If we were to newly consider optimizing such tasks, it would be difficult to perform modeling manually using past information, business knowledge, and tacit knowledge. Furthermore, even if past information is accumulated, it cannot be expected that the same tasks will be performed because there are a huge number of combinations.

本実施例に係る最適化処理装置は、このような問題に対し、学習データを用いた機械学習で予測モデルを自律的に生成することができる。学習データとしては、例えば過去に行わった作業の順序（組合せ）を用いる。学習データは、解空間に対して疎で偏りがあるが、学習データの分布範囲に基づいて探索範囲を制限することで、妥当な解を得ることができる。また、学習データの分布範囲の近傍を探索範囲に含めるとともに、解に基づく新たな作業の実績を学習データに追加することで、学習データの分布範囲を広げることができる。 The optimization processing device according to the present embodiment can autonomously generate a predictive model for such a problem by machine learning using learning data. As the learning data, for example, the order (combination) of tasks performed in the past is used. Although the learning data is sparse and biased in the solution space, a valid solution can be obtained by limiting the search range based on the distribution range of the learning data. Further, by including the vicinity of the distribution range of the learning data in the search range and adding new work results based on the solution to the learning data, the distribution range of the learning data can be expanded.

図４は、最適化処理装置の構成図である。図４に示すように最適化処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、メモリ１２、表示部１３、入力部１４及び記憶部１５を有する。ＣＰＵ１１は、補助記憶装置（例えば記憶部１５）から読み出したプログラムを主記憶装置であるメモリ１２上に展開して実行することで、各種機能部として動作する。図４では、予測モデル生成部２１、距離算定部２２、評価値予測部２３及び探索部２４として動作するプログラムがメモリ１２に展開された状態を示している。 FIG. 4 is a configuration diagram of the optimization processing device. As shown in FIG. 4, the optimization processing device 10 includes a CPU (Central Processing Unit) 11, a memory 12, a display section 13, an input section 14, and a storage section 15. The CPU 11 operates as various functional units by loading programs read from an auxiliary storage device (for example, the storage unit 15) onto the memory 12, which is a main storage device, and executing them. FIG. 4 shows a state in which programs operating as the predictive model generation section 21, the distance calculation section 22, the evaluation value prediction section 23, and the search section 24 are expanded in the memory 12.

表示部１３は、液晶パネルディスプレイなどの出力デバイスであり、入力部１４はキーボードやタッチパネルなどの入力デバイスである。記憶部１５は、ハードディスクドライブなどである。記憶部１５は、各種データやプログラムを記憶するが、本実施例に特に重要なデータとして学習データ３１及び予測モデルデータ３２を記憶する。学習データ３１は、すでに説明したように、予測モデルの生成に用いるデータである。予測モデルデータ３２は、予測モデルを特定するデータである。 The display section 13 is an output device such as a liquid crystal panel display, and the input section 14 is an input device such as a keyboard or touch panel. The storage unit 15 is a hard disk drive or the like. The storage unit 15 stores various data and programs, and stores learning data 31 and predictive model data 32 as particularly important data for this embodiment. As already explained, the learning data 31 is data used to generate a predictive model. Prediction model data 32 is data that specifies a prediction model.

予測モデル生成部２１は、学習データ３１を用いて深層学習を行い、予測モデルを生成する処理部である。予測モデル生成部２１は、生成した予測モデルを特定する予測モデルデータ３２を記憶部１５に格納する。 The predictive model generation unit 21 is a processing unit that performs deep learning using the learning data 31 and generates a predictive model. The predictive model generation unit 21 stores predictive model data 32 that specifies the generated predictive model in the storage unit 15.

距離算定部２２は、学習データ３１の分布から解候補のデータまでの距離を算定する処理部である。
評価値予測部２３は、予測モデルデータ３２を読み出し、予測モデルデータ３２によって特定される予測モデルを用いて解候補のデータの予測評価値を算定する処理部である。
探索部２４は、予測評価値を用いて最適解の探索を行う処理部である。探索部２４は、最適解の探索を行う場合に、学習データ３１の分布に基づいて探索範囲を制限する。 The distance calculation unit 22 is a processing unit that calculates the distance from the distribution of the learning data 31 to the solution candidate data.
The evaluation value prediction unit 23 is a processing unit that reads the prediction model data 32 and uses the prediction model specified by the prediction model data 32 to calculate the predicted evaluation value of the solution candidate data.
The search unit 24 is a processing unit that searches for an optimal solution using the predicted evaluation value. The search unit 24 limits the search range based on the distribution of the learning data 31 when searching for an optimal solution.

ここで、予測モデルの学習について説明する。学習データ３１から予測モデルを学習するにあたり、学習データ３１が相対的に少ないため、実指標の正確な予測は困難である。例えば、図３に示した例では、「作業員１人が１時間で処理できる商品数」を実指標として予測しても誤差が大きくなる。 Here, learning of the predictive model will be explained. When learning a prediction model from the learning data 31, since the learning data 31 is relatively small, it is difficult to accurately predict the actual index. For example, in the example shown in FIG. 3, even if "the number of products that one worker can process in one hour" is predicted as an actual index, the error will be large.

そこで、最適化処理装置１０は、実指標を直接予測するのではなく、実指標の相対的な大小関係を求める。すなわち、最適化処理装置１０は、回帰問題からランキング問題に変換することで、誤差を低減している。予測評価値は、より実指標が良好となる組合せを見つけるための指標であり、実指標に対応する。なお、本実施例では予測評価値は大きいほど良好な値であるものとする。この予測評価値をＶ（ｘ）とすると、
ここで

である。このＰ_ｉｊを用いたランキング学習により予測モデルを生成する。 Therefore, the optimization processing device 10 does not directly predict the actual index, but determines the relative magnitude relationship of the actual index. That is, the optimization processing device 10 reduces errors by converting a regression problem into a ranking problem. The predicted evaluation value is an index for finding a combination that gives a better actual index, and corresponds to the actual index. In this embodiment, it is assumed that the larger the predicted evaluation value, the better the value. If this predicted evaluation value is V(x),
here

It is. A predictive model is generated by ranking learning using this P _ij .

次に、距離の算定について説明する。まず、学習データ３１の特徴が多次元ガウス分布に従うと仮定し、学習データ３１の分布パラメータを推定すると、特定データの特徴と学習データ３１の特徴分布とのマハラノビス距離は、

ここで

となる。 Next, distance calculation will be explained. First, assuming that the features of the learning data 31 follow a multidimensional Gaussian distribution and estimating the distribution parameters of the learning data 31, the Mahalanobis distance between the features of the specific data and the feature distribution of the learning data 31 is

here

becomes.

最適化処理装置１０は、ｘ＝（π_θ（ｔ｜ｃ），ｃ）として、次の損失関数を用いて最適解を求める。

λを十分に大きく設定すれば、損失関数は、学習データ分布からのマハラノビス距離がα以下のとき「－Ｖ（ｘ）」となり、それ以外で大きい値を取ることになる。 The optimization processing device 10 determines an optimal solution using the following loss function, with x=(π _θ (t|c), c).

If λ is set sufficiently large, the loss function will be "-V(x)" when the Mahalanobis distance from the training data distribution is less than or equal to α, and will take a large value otherwise.

図５は、最適化処理装置１０の処理手順を示すフローチャートである。最適化処理装置１０の予測モデル生成部２１は、まず、学習データ３１を記憶部１５に格納する（ステップＳ１０１）。予測モデル生成部２１は、学習データ３１から予測モデルを生成し、生成した予測モデルを特定する予測モデルデータ３２を記憶部１５に格納する（ステップＳ１０２）。 FIG. 5 is a flowchart showing the processing procedure of the optimization processing device 10. The predictive model generation unit 21 of the optimization processing device 10 first stores the learning data 31 in the storage unit 15 (step S101). The predictive model generation unit 21 generates a predictive model from the learning data 31, and stores predictive model data 32 that specifies the generated predictive model in the storage unit 15 (step S102).

予測モデルの生成後、所定のタイミングで、評価値予測部２３は、評価対象データを決定する（ステップＳ１０３）。評価対象データは、解候補としての１つの組合せに対応する。評価値予測部２３は、予測モデルデータ３２を読み出し、予測モデルデータ３２によって特定される予測モデルを用いて評価対象データの予測評価値を算定する（ステップＳ１０４）。 After generating the prediction model, the evaluation value prediction unit 23 determines evaluation target data at a predetermined timing (step S103). The evaluation target data corresponds to one combination as a solution candidate. The evaluation value prediction unit 23 reads the prediction model data 32 and calculates the predicted evaluation value of the evaluation target data using the prediction model specified by the prediction model data 32 (step S104).

距離算定部２２は、評価対象データと学習データ３１の分布からマハラノビス距離を算定する（ステップＳ１０５）。 The distance calculation unit 22 calculates the Mahalanobis distance from the distribution of the evaluation target data and the learning data 31 (step S105).

探索部２４は、予測評価値と距離に基づいて、損失関数による損失の評価を行う（ステップＳ１０６）。ステップＳ１０６の後、探索部２４は、探索を終了するか否かを判定する（ステップＳ１０７）。探索を終了するのは、最適解が特定できた場合や、探索ステップ（ステップＳ１０３～Ｓ１０７）を所定回数繰り返した場合などである。探索を終了しない場合には（ステップＳ１０７；Ｎｏ）、ステップＳ１０３に移行し、次の評価対象データを決定する。探索を終了するならば（ステップＳ１０７）、ステップＳ１０８に移行し、最適解を出力して、処理を終了する。 The search unit 24 evaluates the loss using a loss function based on the predicted evaluation value and the distance (step S106). After step S106, the search unit 24 determines whether to end the search (step S107). The search is terminated when the optimal solution is identified, or when the search steps (steps S103 to S107) are repeated a predetermined number of times. If the search is not completed (step S107; No), the process moves to step S103, and the next evaluation target data is determined. If the search is finished (step S107), the process moves to step S108, the optimal solution is output, and the process ends.

図６は、最適化処理装置１０の処理手順の説明図である。図６では、学習データから予測モデルが生成され、最適化処理に用いられている。最適化処理では、予測モデルを用いて解候補のデータの予測評価値を求める処理と、最適化アルゴリズムを用いて予測評価値及び距離から解候補のデータを求める処理とを繰り返し、最適解を求めている。 FIG. 6 is an explanatory diagram of the processing procedure of the optimization processing device 10. In FIG. 6, a predictive model is generated from the learning data and used in the optimization process. In the optimization process, the process of calculating the predicted evaluation value of solution candidate data using a prediction model and the process of calculating the solution candidate data from the predicted evaluation value and distance using an optimization algorithm are repeated to find the optimal solution. ing.

図７は、学習結果表示画面の具体例である。図７では、ｘ軸を実指標（作業員１人が１時間で処理できる商品数）、ｙ軸を予測評価値としてグラフ表示を行っている。また、作業の順番を実指標及び予測評価値とともに表示することも可能である。具体的には、
「作業順番： B C A・・・G
実指標： 55.8
予測評価値：933.1」
「作業順番： D A E・・・H
実指標： 55.1
予測評価値：905.6」
を表示した状態を示している。
かかる表示により、最適化処理装置１０は、最も予測評価値の高い組合せや最も予測評価値の低い組合せなどをユーザに認識させることができる。 FIG. 7 is a specific example of a learning result display screen. In FIG. 7, a graph is displayed with the x-axis as an actual index (the number of products that one worker can process in one hour) and the y-axis as a predicted evaluation value. It is also possible to display the order of work together with the actual index and predicted evaluation value. in particular,
"Work order: BCA...G
Actual indicator: 55.8
Predicted evaluation value: 933.1”
"Work order: DAE...H
Actual indicator: 55.1
Predicted evaluation value: 905.6”
is displayed.
With this display, the optimization processing device 10 can make the user recognize the combination with the highest predicted evaluation value, the combination with the lowest predicted evaluation value, and the like.

図８は、最適解探索画面の具体例である。図８では、損失関数と距離が最適化ステップの繰り返しによってどのように変化するかを示している。図８に示すように、損失関数の値は最適化ステップ数に応じて減少している。これは、予測評価値の向上を意味する。同様に、距離も最適化ステップ数に応じて減少し、高信頼区間に近づいている。 FIG. 8 is a specific example of the optimal solution search screen. FIG. 8 shows how the loss function and distance change with repeated optimization steps. As shown in FIG. 8, the value of the loss function decreases according to the number of optimization steps. This means an improvement in the predicted evaluation value. Similarly, the distance also decreases with the number of optimization steps and approaches the high confidence interval.

次に、最適化処理装置の変形例について説明する。図９は、最適化処理装置の変形例についての説明図である。図９では、予測モデル生成装置５０と最適化処理装置６０とが接続されている。予測モデル生成装置５０は、学習データから予測モデルを生成する処理を行う。 Next, a modification of the optimization processing device will be described. FIG. 9 is an explanatory diagram of a modification of the optimization processing device. In FIG. 9, a predictive model generation device 50 and an optimization processing device 60 are connected. The predictive model generation device 50 performs processing to generate a predictive model from learning data.

最適化処理装置６０は、予測モデル生成装置５０が生成した予測モデルを記憶する。また、最適化処理装置６０は、学習データの分布を示す分布データを学習データに関する情報として記憶する。 The optimization processing device 60 stores the prediction model generated by the prediction model generation device 50. Further, the optimization processing device 60 stores distribution data indicating the distribution of learning data as information regarding the learning data.

最適化処理装置６０は、問題が与えられたならば、予測モデルと分布データを参照し、分布データからの距離により制限された範囲内で最適化処理を行い、最適解を出力する。この最適解に基づいて作業が行われたならば、最適化処理の結果と作業結果を対応付けて作業実績に蓄積する。蓄積した作業実績を追加の学習データとして予測モデル生成装置５０に提供すれば、予測モデル生成装置５０は、改めて学習を行うことで精度の向上した予測モデルを生成し、最適化処理装置６０に提供できる。 When given a problem, the optimization processing device 60 refers to the prediction model and distribution data, performs optimization processing within a range limited by the distance from the distribution data, and outputs an optimal solution. Once the work is performed based on this optimal solution, the results of the optimization process and the work results are correlated and accumulated in the work results. If the accumulated work results are provided as additional learning data to the predictive model generation device 50, the predictive model generation device 50 will perform learning again to generate a predictive model with improved accuracy and provide it to the optimization processing device 60. can.

上述してきたように、本実施例に係る最適化処理装置は、組合せ最適化の問題に対して最適解を探索する処理を行う処理部として機能するＣＰＵ１１及びメモリ１２と、学習データに関する情報を記憶する記憶部１５とを備え、学習データから組合せの評価値を予測する予測モデルを生成し、予測モデルを用いて最適解の探索を行う場合に学習データの分布に基づいて探索範囲を制限する。このため、モデルを自律的に構築し、未知の問題に適用することができる。 As described above, the optimization processing device according to the present embodiment includes a CPU 11 and a memory 12 that function as a processing unit that performs processing to search for an optimal solution to a combinatorial optimization problem, and a memory 12 that stores information regarding learning data. The storage unit 15 generates a prediction model that predicts the evaluation value of a combination from the learning data, and limits the search range based on the distribution of the learning data when searching for an optimal solution using the prediction model. This allows models to be built autonomously and applied to unknown problems.

ここで、探索範囲は、学習データの分布範囲よりも大きく、学習データの分布範囲を包含することが好ましい。このように設定することで、学習データの分布範囲を拡大して予測モデルを成熟させることができる。 Here, it is preferable that the search range is larger than the distribution range of the learning data and includes the distribution range of the learning data. By setting in this way, it is possible to expand the distribution range of learning data and mature the prediction model.

具体的には、最適化処理装置は、学習データの分布から解候補のデータまでの距離を求め、該距離が距離閾値以下となる範囲を探索範囲とすることで、探索範囲を簡易に定めることができる。また、評価値及び距離に基づく損失関数を用いて最適解を探索することができる。 Specifically, the optimization processing device calculates the distance from the distribution of the learning data to the solution candidate data, and sets the range where the distance is less than or equal to the distance threshold as the search range, thereby easily determining the search range. Can be done. Furthermore, an optimal solution can be searched using a loss function based on evaluation values and distances.

また、最適化処理装置は、ランキング学習を用いて予測モデルを生成することで、実指標を直接予測することで生じる誤差の影響を抑制することができる。
また、最適化処理装置は、最適解に基づいて行われた作業の実績を追加の学習データとして用いることで、予測モデルの精度を向上することができる。 Further, by generating a prediction model using ranking learning, the optimization processing device can suppress the influence of errors caused by directly predicting the actual index.
Further, the optimization processing device can improve the accuracy of the prediction model by using the track record of work performed based on the optimal solution as additional learning data.

変形例として、学習データの追加に伴って探索範囲を小さくし、学習データの分布に近づける（損失関数におけるαを小さくする）構成としてもよい。すなわち、初期は探索範囲と学習データの分布との差分を大きくすることで、学習データの分布の拡大を優先し、学習データの分布範囲が十分に大きくなった後は探索範囲を学習データの分布に近づけることで予測モデルを安定させることができる。 As a modification, a configuration may be adopted in which the search range is made smaller as learning data is added to approximate the distribution of the learning data (α in the loss function is made smaller). In other words, initially, priority is given to expanding the distribution of learning data by increasing the difference between the search range and the distribution of learning data, and after the distribution range of learning data becomes large enough, the search range is changed to the distribution of learning data. The prediction model can be stabilized by bringing it closer to .

なお、本発明は上述の実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、かかる構成の削除に限らず、構成の置き換えや追加も可能である。 Note that the present invention is not limited to the above-described embodiments, and includes various modifications. For example, the embodiments described above are described in detail to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to having all the configurations described. Furthermore, it is possible not only to delete such a configuration but also to replace or add a configuration.

例えば、上述の実施例では、コンピュータが最適化処理プログラムを実行することで最適化処理装置１０として動作する場合を例示して説明を行ったが、各種機能部をハードウェアで構成した最適化処理装置として実施することも可能である。 For example, in the above-mentioned embodiment, the case where the computer operates as the optimization processing device 10 by executing the optimization processing program was explained as an example, but the optimization processing in which various functional units are configured by hardware is explained. It is also possible to implement it as a device.

１０，６０：最適化処理装置、１１：ＣＰＵ、１２：メモリ、１３：表示部、１４：入力部、１５：記憶部、２１：予測モデル生成部、２２：距離算定部、２３：評価値予測部、２４：探索部、３１：学習データ、３２：予測モデルデータ、５０：予測モデル生成装置
10, 60: Optimization processing device, 11: CPU, 12: Memory, 13: Display section, 14: Input section, 15: Storage section, 21: Prediction model generation section, 22: Distance calculation section, 23: Evaluation value prediction Section, 24: Search section, 31: Learning data, 32: Prediction model data, 50: Prediction model generation device

Claims

a processing unit that performs processing to search for an optimal solution to a combinatorial optimization problem;
a storage unit that stores information regarding learning data that is a set of past problems and solutions ;
The processing unit includes:
Generate a prediction model that predicts evaluation values of combinations that are candidate solutions by machine learning using the learning data,
When determining the evaluation value for a combination as a candidate solution to a given problem using the prediction model and searching for a combination with a higher evaluation value to find the optimal solution, based on the distribution of the learning data. It limits the search range,
Find the distance from the distribution of the learning data to the solution candidate data, and set the range where the distance is less than or equal to the distance threshold as the search range.
An optimization processing device characterized by:

The optimization processing device according to claim 1, wherein the search range is larger than the distribution range of the learning data and includes the distribution range of the learning data.

The optimization processing device according to claim 1, wherein the processing unit searches for the optimal solution using a loss function based on the evaluation value and the distance.

The optimization processing device according to claim 1, wherein the processing unit generates the prediction model using ranking learning.

An optimization processing method for searching for an optimal solution to a combinatorial optimization problem,
The optimization processing device
A predictive model generation step of generating a predictive model that predicts the evaluation value of a combination of solution candidates by machine learning using learning data that is a set of past problems and solutions ;
a search step of determining the evaluation value using the prediction model for a combination as a candidate solution to a given problem , and searching for an optimal solution by searching for a combination with a higher evaluation value;
In the search step, the search range is a range in which the distance from the distribution of the learning data to the solution candidate data is less than or equal to a distance threshold.
An optimization processing method characterized by:

An optimization processing program that searches for an optimal solution to a combinatorial optimization problem,
A predictive model generation procedure that generates a predictive model that predicts the evaluation value of a combination of solution candidates by machine learning using learning data that is a set of past problems and solutions ;
causing a computer to execute a search procedure for determining the evaluation value using the prediction model for a combination as a candidate solution to a given problem , and searching for a combination with a higher evaluation value to find an optimal solution ;
In the search procedure, the search range is a range in which the distance from the distribution of the learning data to the solution candidate data is less than or equal to a distance threshold.
An optimization processing program characterized by: