JP6804009B2

JP6804009B2 - Learning devices, learning methods, and learning programs

Info

Publication number: JP6804009B2
Application number: JP2020513128A
Authority: JP
Inventors: 豪啓安藤; 理貴近藤
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2018-04-11
Filing date: 2019-03-13
Publication date: 2020-12-23
Anticipated expiration: 2039-03-13
Also published as: WO2019198408A1; JPWO2019198408A1

Description

本開示は、学習装置、学習方法、及び学習プログラムに関する。 The present disclosure relates to learning devices, learning methods, and learning programs.

第１の結果値を有するデータと第２の結果値を有するデータとの間の関係に基づいて、第２の結果値を有するデータを第１の結果値を有するデータに変化させるための評価項目及びその値を抽出するデータ分析装置が提案されている（特開２０００−３０５９４１号公報参照）。このデータ分析装置は、抽出した評価項目の値を変更する場合に、結果値への影響を調べ、かつ抽出した評価項目の値の変更の効果を計算する。 An evaluation item for changing the data having the second result value into the data having the first result value based on the relationship between the data having the first result value and the data having the second result value. And a data analyzer for extracting the value thereof have been proposed (see JP-A-2000-305941). When the value of the extracted evaluation item is changed, this data analyzer investigates the influence on the result value and calculates the effect of changing the value of the extracted evaluation item.

また、訓練データの複数の属性値に対応する出力値の対応関係を、異なる複数の予測アルゴリズムを含む能動学習装置を用いて、複数の予測アルゴリズムでそれぞれ学習させるデータセット選択装置が提案されている（特開２００７−３０４７８２号公報参照）。このデータセット選択装置は、複数の予測アルゴリズムによりそれぞれ学習された複数の対応関係を用いて、予測データに対応する出力値を予測し、予測結果値として複数の予測アルゴリズム毎に複数取得する。また、このデータセット選択装置は、取得した複数の予測アルゴリズムによる複数の予測結果値のばらつきが、対応する予測データのデータセット内で大きいものを選択する。 Further, a data set selection device has been proposed in which the correspondence relationship of output values corresponding to a plurality of attribute values of training data is learned by a plurality of prediction algorithms by using an active learning device including a plurality of different prediction algorithms. (See JP-A-2007-304782). This data set selection device predicts the output value corresponding to the prediction data by using a plurality of correspondences learned by each of the plurality of prediction algorithms, and acquires a plurality of output values for each of the plurality of prediction algorithms as prediction result values. Further, this data set selection device selects a data set in which the variation of a plurality of prediction result values by the acquired plurality of prediction algorithms is large in the corresponding prediction data data set.

また、技術的な系の出力量であって、入力量ベクトルの形式の複数の入力量に非線形に依存する出力量に関するモデルを算出する技術が提案されている（特開２０１６−５３０５８５号公報参照）。 Further, a technique for calculating a model relating to an output amount of a technical system that is non-linearly dependent on a plurality of input amounts in the form of an input amount vector has been proposed (see JP-A-2016-530585). ).

材料の研究開発において、性能がより良い材料を獲得するために、実験を繰り返し行うことによって、性能がより良い材料を探索することが行われている。この場合、過去に行われた材料を生成するための実験条件と実験結果の性能値との組み合わせから、適切な実験条件を新たに探索することができると研究開発の効率化のためにも好ましい。 In material research and development, in order to obtain a material with better performance, a material with better performance is searched for by repeating experiments. In this case, it is preferable for the efficiency of research and development that an appropriate experimental condition can be newly searched from the combination of the experimental condition for producing the material performed in the past and the performance value of the experimental result. ..

特開２０００−３０５９４１号公報に記載の技術では、データベースに存在するデータに類似するデータしか探索することができないため、材料を生成するための実験条件を探索する手法に適用したとしても、必ずしも適切な実験条件を探索することができない場合がある、という問題点がある。また、特開２００７−３０４７８２号公報及び特開２０１６−５３０５８５号公報に記載の技術は、そもそも新たな実験条件を探索することについては考慮されていない。なお、この問題点は、材料の研究開発に限らず、薬剤の研究開発でも発生し得る問題点である。 Since the technique described in JP-A-2000-305941 can only search for data similar to the data existing in the database, it is not always appropriate even if it is applied to a method for searching for experimental conditions for producing a material. There is a problem that it may not be possible to search for various experimental conditions. Further, the techniques described in JP-A-2007-304782 and JP-A-2016-530585 do not take into consideration the search for new experimental conditions in the first place. It should be noted that this problem is a problem that can occur not only in the research and development of materials but also in the research and development of drugs.

本開示は、以上の事情を鑑みて成されたものであり、材料又は薬剤を生成するための適切な実験条件を探索可能とすることを目的とする。 The present disclosure has been made in view of the above circumstances, and an object of the present disclosure is to make it possible to search for appropriate experimental conditions for producing a material or a drug.

本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出する導出部と、導出部により導出された評価値を反映させる機械学習によって出力モデルを学習させる学習部と、を備えている。 The learning device of the present disclosure inputs a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and inputs a plurality of combinations to an output model using the experimental conditions as an output. An out-licensing unit that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental conditions into the experimental model for performing a virtual experiment, and an evaluation derived by the out-licensing unit. It is equipped with a learning unit that trains the output model by machine learning that reflects the value.

これにより、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for suitable experimental conditions for producing a material or drug.

なお、本開示の学習装置は、評価値が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であってもよいし、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であってもよいし、性能値が目標とする性能に近いほど良い値であってもよい。 In the learning device of the present disclosure, the higher the ratio of the evaluation values to the values satisfying the target performance in the plurality of performance values, the better the value may be, and the performance value satisfying the target performance can be obtained. The smaller the number of virtual experiments up to, the better the value, or the closer the performance value is to the target performance, the better the value.

これにより、性能値として適切な値を用いて探索行動を評価することにより適切な出力モデルが定まる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, an appropriate output model is determined by evaluating the search behavior using an appropriate value as a performance value, and as a result, an appropriate experimental condition for producing a material or a drug can be searched.

また、本開示の学習装置では、導出部は、出力モデルから、予め定められた規則を満たさない実験条件が出力された場合、評価値を低く補正してもよい。 Further, in the learning device of the present disclosure, the derivation unit may correct the evaluation value to a low value when the output model outputs experimental conditions that do not satisfy the predetermined rules.

これにより、過去の経験則等に基づく予め定められた規則を満たす実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, it is possible to increase the possibility that experimental conditions satisfying predetermined rules based on past empirical rules and the like can be obtained, and as a result, it is possible to search for appropriate experimental conditions for producing a material or a drug. ..

また、本開示の学習装置では、導出部が、出力モデルから出力された実験条件を実際の実験に使用可能な実験条件に補正してもよい。 Further, in the learning device of the present disclosure, the derivation unit may correct the experimental conditions output from the output model to experimental conditions that can be used in an actual experiment.

これにより、実際の実験に使用可能な実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to increase the possibility of obtaining experimental conditions that can be used in actual experiments, and as a result, it is possible to search for appropriate experimental conditions for producing a material or a drug.

また、本開示の学習装置は、出力モデルが、遺伝的アルゴリズムを用いて学習されるモデルであってもよい。 Further, the learning device of the present disclosure may be a model in which the output model is learned by using a genetic algorithm.

これにより、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more suitable experimental conditions for producing a material or drug.

本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる学習部を備えている。 The learning device of the present disclosure is an output model in which a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and candidates for experimental conditions are input, and the action value in reinforcement learning is output. , A virtual experiment is performed on the candidates for the experimental conditions corresponding to the action values above a predetermined value among the multiple action values output by inputting each of a plurality of combinations and candidates for a plurality of different experimental conditions. It is equipped with a learning unit that trains the output model using a value derived based on the performance value of the experimental result obtained by inputting it into the experimental model to be performed as a reward.

なお、本開示の学習装置は、報酬が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は性能値が目標とする性能に近いほど良い値であってもよい。 In the learning device of the present disclosure, the higher the ratio of the value satisfying the target performance in the plurality of performance values, the better the reward, or the virtual value until the performance value satisfying the target performance is obtained. The smaller the number of experiments, the better the value, or the closer the performance value is to the target performance, the better the value.

これにより、性能値として適切な値が用いられる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result of using an appropriate value as the performance value, it is possible to search for an appropriate experimental condition for producing a material or a drug.

また、本開示の学習装置は、強化学習が、Ｑ学習であり、行動価値が、Ｑ値であってもよい。 Further, in the learning device of the present disclosure, reinforcement learning may be Q-learning, and the action value may be a Q-value.

また、本開示の学習装置は、学習部により学習された出力モデルを用いる場合に、複数の実験条件の候補を出力モデルに逐次的に複数回入力することにより出力された累計の行動価値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する出力部を更に備えてもよい。 Further, in the learning device of the present disclosure, when the output model learned by the learning unit is used, the cumulative action value output by sequentially inputting the candidates of a plurality of experimental conditions into the output model a plurality of times is the maximum. An output unit may be further provided to output the candidate of the experimental condition to be the candidate of the experimental condition to be the next experimental condition.

また、本開示の学習装置は、実験モデルが、機械学習により得られたモデルであってもよい。 Further, in the learning device of the present disclosure, the experimental model may be a model obtained by machine learning.

これにより、特定の問題に特化した出力モデルを生成することができる。 This makes it possible to generate an output model that is specific to a specific problem.

また、本開示の学習装置は、実験モデルが、複数存在し、複数の実験モデルのそれぞれの作成条件が異なってもよい。 Further, the learning device of the present disclosure may have a plurality of experimental models, and the creation conditions of the plurality of experimental models may be different.

これにより、作成条件の異なる実験モデルにより得られた複数の仮想的な実験結果を用いることによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。また、実験モデルが、ｓｉｎ又はｅｘｐ等の関数を含んで構成された数式であってもよい。これにより、実験データが全く得られていないような実験系でも、出力モデルを生成することができる。 Thereby, by using a plurality of virtual experimental results obtained by experimental models having different preparation conditions, it is possible to search for more appropriate experimental conditions for producing a material or a drug. Further, the experimental model may be a mathematical formula constructed by including a function such as sin or exp. This makes it possible to generate an output model even in an experimental system for which no experimental data has been obtained.

また、本開示の学習装置では、出力モデルが、複数存在し、複数の出力モデルのそれぞれの作成条件が異なってもよい。 Further, in the learning device of the present disclosure, a plurality of output models may exist, and the creation conditions of the plurality of output models may be different.

これにより、作成条件の異なる出力モデルにより得られた複数の実験条件から得られる複数の性能値の評価により学習することによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more appropriate experimental conditions for producing a material or drug by learning by evaluating multiple performance values obtained from multiple experimental conditions obtained by output models with different creation conditions. it can.

本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータが実行する方法である。 In the learning method of the present disclosure, a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results are input, and a plurality of combinations are input to an output model using the experimental conditions as an output. By inputting the output experimental conditions into an experimental model for performing a virtual experiment, the evaluation values of the output model are derived using the performance values of the experimental results obtained, and the derived evaluation values are reflected by machine learning. This is a method in which a computer executes a process of training an output model.

本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータに実行させるためのものである。 In the learning program of the present disclosure, a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results are input, and a plurality of combinations are input to an output model using the experimental conditions as an output. By inputting the output experimental conditions into an experimental model for performing a virtual experiment, the evaluation values of the output model are derived using the performance values of the experimental results obtained, and the derived evaluation values are reflected by machine learning. This is to make the computer execute the process of training the output model.

本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータが実行する方法である。 The learning method of the present disclosure is an output model in which a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and candidates for experimental conditions are input, and the action value in reinforcement learning is output. , A virtual experiment is performed on the candidates for the experimental conditions corresponding to the action values above a predetermined value among the multiple action values output by inputting each of the plurality of combinations and the candidates for the plurality of different experimental conditions. This is a method in which a computer executes a process of training an output model using a value derived based on the performance value of the experimental result obtained by inputting the experimental model to be performed as a reward.

本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータに実行させるためのものである。 The learning program of the present disclosure is an output model in which a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and candidates for experimental conditions are input, and the action value in reinforcement learning is output. , A virtual experiment is performed on the candidates for the experimental conditions corresponding to the action values above a predetermined value among the multiple action values output by inputting each of the plurality of combinations and the candidates for the plurality of different experimental conditions. This is for causing a computer to execute a process of training an output model by using a value derived based on the performance value of the experimental result obtained by inputting the experimental model to be performed as a reward.

また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させるプロセッサを有する。 Further, the learning device of the present disclosure inputs a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and inputs a plurality of combinations to an output model using the experimental conditions as an output. A machine that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the experimental conditions output by the above into the experimental model for performing a virtual experiment, and reflects the derived evaluation value. It has a processor that trains an output model by training.

また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させるプロセッサを有する。 Further, the learning device of the present disclosure inputs a plurality of combinations of experimental conditions for producing a material or a drug and performance values of experimental results, and candidates for experimental conditions, and outputs an action value in reinforcement learning as an output. Of the multiple action values output by inputting a plurality of combinations and each of a plurality of different experimental condition candidates into the model, the experimental condition candidates corresponding to the action values equal to or higher than a predetermined value are virtually created. It has a processor that trains an output model by using a value derived based on the performance value of the experimental result obtained by inputting the experimental model to perform the experiment as a reward.

本開示によれば、材料又は薬剤を生成するための適切な実験条件を探索することができる。 According to the present disclosure, suitable experimental conditions for producing a material or drug can be searched for.

第１実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the learning apparatus in the learning phase which concerns on 1st Embodiment. 各実施形態に係る学習用データの一例を示す図である。It is a figure which shows an example of the learning data which concerns on each embodiment. 第１実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 1st Embodiment. 第１実施形態に係る出力モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the output model which concerns on 1st Embodiment. 各実施形態に係る実験モデルの一例を示す図である。It is a figure which shows an example of the experimental model which concerns on each embodiment. 各実施形態に係る実験モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the experimental model which concerns on each embodiment. 第１実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation process of the evaluation value of the output model which concerns on 1st Embodiment. 変形例に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation process of the evaluation value of the output model which concerns on a modification. 第１実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the learning apparatus in the operation phase which concerns on 1st Embodiment. 各実施形態に係る学習装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware composition of the learning apparatus which concerns on each embodiment. 各実施形態に係る実験モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experimental model learning process which concerns on each embodiment. 第１実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 1st Embodiment. 第１実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experimental condition output processing which concerns on 1st Embodiment. 第２実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the learning apparatus in the learning phase which concerns on 2nd Embodiment. 第２実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 2nd Embodiment. 第２実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation process of the evaluation value of the output model which concerns on 2nd Embodiment. 第２実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the learning apparatus in the operation phase which concerns on 2nd Embodiment. 第２実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 2nd Embodiment. 第２実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experimental condition output processing which concerns on 2nd Embodiment.

以下、図面を参照して、本開示の技術を実施するための形態例を詳細に説明する。 Hereinafter, examples of embodiments for carrying out the technique of the present disclosure will be described in detail with reference to the drawings.

［第１実施形態］
まず、図１を参照して、本実施形態に係る学習フェーズにおける学習装置１０の機能的な構成について説明する。図１に示すように、学習装置１０は、導出部１２及び学習部１４を備える。また、学習装置１０の記憶部４２（図１０参照）には、学習用データ２０、複数の出力モデル２２、及び複数の実験モデル２４が記憶される。[First Embodiment]
First, with reference to FIG. 1, the functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described. As shown in FIG. 1, the learning device 10 includes a derivation unit 12 and a learning unit 14. Further, the storage unit 42 (see FIG. 10) of the learning device 10 stores the learning data 20, the plurality of output models 22, and the plurality of experimental models 24.

図２に、学習用データ２０の一例を示す。図２に示すように、本実施形態に係る学習用データ２０は、材料を生成するための実験条件と、その実験条件で実験を行った場合の実験結果の材料の性能値との組み合わせを含む。実験条件は、例えば、半導体レジスト材料等の材料を生成する際の条件であり、主成分組成、添加物量、及びプロセス条件を含む。図２の例では、主成分組成は、材料の主成分の比率を示し、添加物量は添加物の濃度を示し、プロセス条件は、材料を生成する際の温度を示す。 FIG. 2 shows an example of learning data 20. As shown in FIG. 2, the learning data 20 according to the present embodiment includes a combination of experimental conditions for producing a material and the performance value of the material as an experimental result when the experiment is performed under the experimental conditions. .. The experimental conditions are conditions for producing a material such as a semiconductor resist material, and include a main component composition, an amount of additives, and process conditions. In the example of FIG. 2, the main component composition indicates the ratio of the main components of the material, the amount of additive indicates the concentration of the additive, and the process condition indicates the temperature at which the material is produced.

また、学習用データ２０の性能値は、対応する実験条件により材料が生成された際の材料の性能値を示す。本実施形態に係る性能値は、材料の出来のよさを表す尺度であり、例えば、材料の表面の凹凸の度合い、及び所望の大きさの穴があけられたかを表す度合い等が挙げられる。また、本実施形態では、性能値が小さいほど材料の出来がよいことを示している。また、本実施形態の学習用データ２０は、複数の異なる実験条件と性能値との組み合わせを含む。なお、実験条件には、同じものが複数含まれていてもよい。 Further, the performance value of the learning data 20 indicates the performance value of the material when the material is produced under the corresponding experimental conditions. The performance value according to the present embodiment is a scale indicating the quality of the material, and examples thereof include the degree of unevenness on the surface of the material and the degree of indicating whether a hole of a desired size has been formed. Further, in the present embodiment, the smaller the performance value, the better the quality of the material. Further, the learning data 20 of the present embodiment includes a combination of a plurality of different experimental conditions and performance values. The experimental conditions may include a plurality of the same ones.

図３に、出力モデル２２の一例を示す。図３に示すように、本実施形態に係る出力モデル２２は、入力層、複数の中間層、及び出力層を含むニューラルネットワークである。出力モデル２２の入力層には、実験条件と性能値との複数の組み合わせが入力される。出力モデル２２の出力層は、１つの実験条件を出力する。図４に、出力モデル２２の出力層から出力される実験条件のデータ構造の一例を示す。図４に示すように、出力モデル２２の出力層は、例えば、主成分組成、添加物量、及びプロセス条件を含む実験条件を出力する。 FIG. 3 shows an example of the output model 22. As shown in FIG. 3, the output model 22 according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions and performance values are input to the input layer of the output model 22. The output layer of the output model 22 outputs one experimental condition. FIG. 4 shows an example of the data structure of the experimental conditions output from the output layer of the output model 22. As shown in FIG. 4, the output layer of the output model 22 outputs experimental conditions including, for example, the main component composition, the amount of additives, and the process conditions.

詳細には、出力モデル２２は、例えば、以下の（１）〜（３）に示すように構成される。
（１）入力層のノード数：Ｎ×Ｍ
なお、Ｎは、実験条件の項目数を表し、Ｍは、実験回数を表す。
（２）中間層の構成：カーネルが３×３、フィルタ数が３２、ストライドが２、活性化関数がＲｅｌｕの畳み込み層を１０層有する。
（３）出力層のノード数：Ｎ×１In detail, the output model 22 is configured as shown in (1) to (3) below, for example.
(1) Number of nodes in the input layer: N × M
In addition, N represents the number of items of an experiment condition, and M represents the number of experiments.
(2) Structure of intermediate layer: The kernel has 3 × 3, the number of filters is 32, the stride is 2, and the activation function has 10 convolution layers of Relu.
(3) Number of nodes in the output layer: N × 1

また、本実施の形態に係る複数の出力モデル２２は、それぞれモデルの作成条件が異なる。詳細には、複数の出力モデル２２は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも１つが異なることによって、それぞれモデルの作成条件が異なる。 Further, the plurality of output models 22 according to the present embodiment have different model creation conditions. Specifically, the plurality of output models 22 differ in model creation conditions because at least one of the number of layers in the intermediate layer, the number of nodes in each layer of the intermediate layer, and the initial value of the weight is different.

図５に、実験モデル２４の一例を示す。図５に示すように、本実施形態に係る実験モデル２４は、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。実験モデル２４は、仮想的な実験を行うモデルであり、実験モデル２４の入力層には、１つの実験条件が入力される。実験モデル２４の出力層は、入力層に入力された１つの実験条件に対応する実験結果の性能値を出力する。図６に、実験モデル２４の出力層から出力される実験結果の性能値のデータ構造の一例を示す。なお、実験モデル２４は、複数種類の性能値を出力してもよい。この場合、例えば、実験モデル２４は、材料の性能値として、材料の表面の凹凸の度合い、及び材料の光感度の双方を出力する。 FIG. 5 shows an example of the experimental model 24. As shown in FIG. 5, the experimental model 24 according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. The experimental model 24 is a model for performing a virtual experiment, and one experimental condition is input to the input layer of the experimental model 24. The output layer of the experimental model 24 outputs the performance value of the experimental result corresponding to one experimental condition input to the input layer. FIG. 6 shows an example of the data structure of the performance value of the experimental result output from the output layer of the experimental model 24. The experimental model 24 may output a plurality of types of performance values. In this case, for example, the experimental model 24 outputs both the degree of unevenness on the surface of the material and the light sensitivity of the material as the performance value of the material.

詳細には、実験モデル２４は、例えば、以下の（４）〜（６）に示すように構成される。
（４）入力層のノード数：Ｎ×１
なお、Ｎは、実験条件の項目数を表す。
（５）中間層の構成：カーネルが３×３、フィルタ数が３２、ストライドが２、活性化関数がＲｅｌｕの畳み込み層を４層有する。
（６）出力層のノード数：１×Ｊ
なお、Ｊは、性能値の種類数を表す。In detail, the experimental model 24 is configured as shown in (4) to (6) below, for example.
(4) Number of nodes in the input layer: N × 1
In addition, N represents the number of items of the experimental condition.
(5) Structure of intermediate layer: The kernel has 3 × 3, the number of filters is 32, the stride is 2, and the activation function has 4 convolution layers of Relu.
(6) Number of nodes in the output layer: 1 x J
In addition, J represents the number of types of performance values.

また、本実施の形態に係る複数の実験モデル２４は、それぞれモデルの作成条件が異なる。詳細には、複数の実験モデル２４は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも１つが異なることによって、それぞれモデルの作成条件が異なる。 Further, the plurality of experimental models 24 according to the present embodiment have different model creation conditions. Specifically, the plurality of experimental models 24 differ in model creation conditions because at least one of the number of layers in the intermediate layer, the number of nodes in each layer of the intermediate layer, and the initial value of the weight is different.

導出部１２は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを出力モデル２２に入力し、出力モデル２２から出力された実験条件を取得する。詳細には、導出部１２は、まず、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせを出力モデル２２に入力し、出力モデル２２から出力された実験条件を取得する。なお、導出部１２は、学習用データ２０に含まれる一部の複数の実験条件と性能値との組み合わせを出力モデル２２に入力してもよいし、学習用データ２０とは異なる複数の実験条件と性能値との組み合わせを出力モデル２２に入力してもよい。 The derivation unit 12 inputs a plurality of combinations of the experimental conditions for producing the material and the performance values of the experimental results into the output model 22, and acquires the experimental conditions output from the output model 22. Specifically, the derivation unit 12 first inputs the combinations of all the experimental conditions and the performance values included in the learning data 20 into the output model 22, and acquires the experimental conditions output from the output model 22. The derivation unit 12 may input a combination of some plurality of experimental conditions and performance values included in the learning data 20 into the output model 22, or a plurality of experimental conditions different from the learning data 20. The combination of and the performance value may be input to the output model 22.

また、導出部１２は、出力モデル２２から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。本実施形態では、導出部１２は、出力モデル２２から出力された実験条件を、実際に使用する実験装置の制約を満たす最も近い実験条件に補正する。例えば、実験装置の仕様上、プロセス条件で設定可能な温度が５℃単位で、かつ出力モデル２２から出力された実験条件に含まれるプロセス条件の温度が５℃単位ではない温度（例えば、９２．３℃）の場合、導出部１２は、出力モデル２２から出力された上記温度を、最も近い５の倍数の温度（例えば、９０℃）に補正する。 Further, the derivation unit 12 corrects the experimental conditions output from the output model 22 to experimental conditions that can be used in an actual experiment. In the present embodiment, the out-licensing unit 12 corrects the experimental conditions output from the output model 22 to the closest experimental conditions that satisfy the restrictions of the experimental device actually used. For example, due to the specifications of the experimental equipment, the temperature that can be set in the process conditions is in units of 5 ° C, and the temperature of the process conditions included in the experimental conditions output from the output model 22 is not in units of 5 ° C (for example, 92. In the case of 3 ° C.), the derivation unit 12 corrects the temperature output from the output model 22 to the nearest multiple of 5 (for example, 90 ° C.).

次に、導出部１２は、補正して得られた実験条件を各実験モデル２４に入力し、各実験モデル２４から出力された性能値をそれぞれ取得する。 Next, the derivation unit 12 inputs the corrected experimental conditions to each experimental model 24, and acquires the performance values output from each experimental model 24, respectively.

更に、導出部１２は、出力モデル２２に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル２４に入力した実験条件と導出した性能値との組み合わせを追加した複数セットの実験条件と性能値との複数の組み合わせを得る。そして、導出部１２は、再度、得られた複数セットの実験条件と性能値との複数の組み合わせを出力モデル２２に入力することにより得られた複数の実験条件をそれぞれ対応する各実験モデル２４に入力する。これにより、導出部１２は、再度、それぞれ対応する実験モデル２４に入力した実験条件に対応する性能値を得る。導出部１２は、以上の出力モデル２２に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル２４に入力した実験条件と得られた性能値との組み合わせを追加して、再度それぞれ対応する実験モデル２４を用いて性能値を得る処理を所定の回数（例えば、１００回）繰り返す。 Further, the derivation unit 12 adds a combination of the experimental conditions input to the corresponding experimental model 24 and the derived performance values to a plurality of combinations of the plurality of sets of experimental conditions and performance values input to the output model 22. Obtain multiple combinations of multiple sets of experimental conditions and performance values. Then, the derivation unit 12 again inputs the plurality of combinations of the obtained plurality of sets of experimental conditions and performance values to the output model 22, and the plurality of experimental conditions obtained by inputting the plurality of experimental conditions to the corresponding experimental models 24. input. As a result, the out-licensing unit 12 again obtains performance values corresponding to the experimental conditions input to the corresponding experimental models 24. The derivation unit 12 adds a combination of the experimental conditions input to the corresponding experimental model 24 and the obtained performance values to the plurality of combinations of the plurality of sets of experimental conditions and performance values input to the above output model 22. Then, the process of obtaining the performance value using the corresponding experimental model 24 is repeated a predetermined number of times (for example, 100 times).

また、導出部１２は、以上の処理を各出力モデル２２に対して行う。すなわち、導出部１２は、各出力モデル２２について、所定回数分の出力モデル２２から出力された実験条件と、その実験条件に対応する性能値との複数の組み合わせを得る。 Further, the derivation unit 12 performs the above processing for each output model 22. That is, the derivation unit 12 obtains a plurality of combinations of the experimental conditions output from the output model 22 for a predetermined number of times and the performance values corresponding to the experimental conditions for each output model 22.

導出部１２は、各出力モデル２２について、得られた所定回数分の性能値を用いて、出力モデル２２の評価値を導出する。本実施形態では、導出部１２は、一例として図７に示すように、目標とする性能を満たす性能値（本実施形態では、目標値以下である性能値）が得られるまでの仮想的な実験回数（図７に示すＮ）が少ないほど良い値として出力モデル２２の評価値を導出する。なお、図７の縦軸は性能値を示し、横軸はその性能値が何回目の仮想的な実験で得られた値であるかを表す仮想的な実験回数を示す。図７の例では、Ｎ回目の仮想的な実験で初めて目標とする性能を満たす性能値が得られたことを示している。 The derivation unit 12 derives the evaluation value of the output model 22 for each output model 22 by using the obtained performance values for a predetermined number of times. In the present embodiment, as shown in FIG. 7 as an example, the out-licensing unit 12 is a virtual experiment until a performance value satisfying the target performance (in the present embodiment, a performance value equal to or less than the target value) is obtained. As the number of times (N shown in FIG. 7) decreases, the evaluation value of the output model 22 is derived as a better value. The vertical axis of FIG. 7 shows the performance value, and the horizontal axis shows the number of virtual experiments indicating how many times the performance value is the value obtained in the virtual experiment. In the example of FIG. 7, it is shown that the performance value satisfying the target performance was obtained for the first time in the Nth virtual experiment.

なお、導出部１２は、一例として図８に示すように、得られた所定回数分の性能値における目標とする性能を満たす性能値の比率（図８に示す全ての性能値の数に対する一点鎖線の矩形で囲まれた性能値の数の比率）が高いほど良い値として出力モデル２２の評価値を導出してもよい。なお、図８における「ｇｏｏｄ」は、目標とする性能を満たすことを意味する。また、導出部１２は、各性能値が目標値に近いほど良い値として出力モデル２２の評価値を導出してもよい。 As shown in FIG. 8 as an example, the out-licensing unit 12 is a one-dot chain line with respect to the ratio of the performance values satisfying the target performance in the obtained performance values for a predetermined number of times (the number of all the performance values shown in FIG. 8). The higher the ratio of the number of performance values surrounded by the rectangle, the better the evaluation value of the output model 22 may be derived. In addition, "good" in FIG. 8 means that the target performance is satisfied. Further, the derivation unit 12 may derive the evaluation value of the output model 22 as a better value as each performance value is closer to the target value.

なお、導出部１２は、出力モデル２２から予め定められた規則を満たさない実験条件が出力された場合に、上記評価値を低く補正してもよい。この予め定められた規則としては、例えば、材料Ａと材料Ｂとを混合させることは無い、又は５種類以上の材料を混合させることは無い等のユーザの経験則に従った規則が挙げられる。 The out-licensing unit 12 may correct the evaluation value to be low when the output model 22 outputs experimental conditions that do not satisfy the predetermined rules. Examples of the predetermined rules include rules according to the user's rule of thumb, such as not mixing material A and material B, or mixing five or more kinds of materials.

学習部１４は、機械学習の一例としての誤差逆伝播法を用いて、実験モデル２４を学習させる。具体的には、学習部１４は、学習用データ２０に含まれる実験条件を実験モデル２４に入力し、実験モデル２４から出力された性能値を取得する。そして、学習部１４は、取得した性能値と、学習用データ２０に含まれる実験条件に対応する性能値との差が最小となるように、実験モデル２４を学習させる。学習部１４は、この実験モデル２４を学習させる処理を、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせを用いて行う。なお、学習部１４は、学習用データ２０に含まれる一部の実験条件と性能値との複数の組み合わせを用いて実験モデル２４を学習させてもよい。また、学習部１４が各実験モデル２４を学習させる際に各実験モデル２４に入力するデータは、各実験モデル２４間で同じデータでもよいし、異なるデータでもよい。 The learning unit 14 trains the experimental model 24 by using the error backpropagation method as an example of machine learning. Specifically, the learning unit 14 inputs the experimental conditions included in the learning data 20 into the experimental model 24, and acquires the performance value output from the experimental model 24. Then, the learning unit 14 trains the experimental model 24 so that the difference between the acquired performance value and the performance value corresponding to the experimental conditions included in the learning data 20 is minimized. The learning unit 14 performs the process of training the experimental model 24 by using a combination of all the experimental conditions and the performance values included in the learning data 20. The learning unit 14 may train the experimental model 24 by using a plurality of combinations of some experimental conditions and performance values included in the learning data 20. Further, the data input to each experimental model 24 when the learning unit 14 trains each experimental model 24 may be the same data or different data among the experimental models 24.

また、学習部１４は、各出力モデル２２について導出部１２により導出された評価値を用いて、最適化アルゴリズムの一例としての遺伝的アルゴリズムを用いた機械学習によって各出力モデル２２を学習させる。なお、この遺伝的アルゴリズムで用いられる個体の選択手法（例えば、ルーレット選択等）、交叉方法（例えば、二点交叉等）、及び突然変異の確率等のパラメータは、ユーザによって予め設定される。 Further, the learning unit 14 trains each output model 22 by machine learning using a genetic algorithm as an example of the optimization algorithm, using the evaluation values derived by the derivation unit 12 for each output model 22. Parameters such as an individual selection method (for example, roulette selection), a crossover method (for example, two-point crossover), and a mutation probability used in this genetic algorithm are preset by the user.

詳細には、例えば、学習部１４は、各出力モデル２２のうち、最も評価の良い２つの出力モデル２２を交配することによって新たな出力モデル２２を生成する。この交配は、例えば、一方の出力モデル２２の入力層と中間層のうちの入力層側の半分の中間層、及び他方の出力モデル２２の中間層のうちの出力層側の半分の中間層と出力層を結合することによって行われる。なお、交配の手法はこの例に限定されない。例えば、一方の出力モデル２２の図３に示す入力層、中間層、及び出力層の上半分と、他方の出力モデル２２の図３に示す入力層、中間層、及び出力層の下半分と、を結合することによって交配を行ってもよい。また、本実施形態では、学習部１４は、世代間で出力モデル２２の数が変わらないように、遺伝的アルゴリズムにより次世代の出力モデル２２を生成する。すなわち、遺伝的アルゴリズムを用いることにより出力モデル２２の重み値が更新されることによって、出力モデル２２が学習される。また、出力モデル２２が学習されることにより、導出部１２により導出された評価値が反映される。 Specifically, for example, the learning unit 14 generates a new output model 22 by mating the two output models 22 having the highest evaluation among the output models 22. This mating is performed, for example, with the input layer of one output model 22 and the middle layer of the middle layer on the input layer side, and the middle layer of the other output model 22 on the output layer side. This is done by combining the output layers. The mating method is not limited to this example. For example, the upper half of the input layer, the intermediate layer, and the output layer shown in FIG. 3 of one output model 22, and the lower half of the input layer, the intermediate layer, and the output layer shown in FIG. 3 of the other output model 22. Mating may be carried out by binding. Further, in the present embodiment, the learning unit 14 generates the next-generation output model 22 by a genetic algorithm so that the number of output models 22 does not change between generations. That is, the output model 22 is learned by updating the weight value of the output model 22 by using the genetic algorithm. Further, by learning the output model 22, the evaluation value derived by the out-licensing unit 12 is reflected.

上記の導出部１２による各出力モデル２２の評価値の導出処理、及び学習部１４による出力モデル２２群の学習処理は、所定の世代数（例えば、１万世代）だけ行われる。そして、学習部１４は、最終世代において評価値が示す評価が最も良い１つの出力モデル２２を、後述する運用フェーズで用いる出力モデル２２Ａとして記憶部４２に記憶する。なお、上記の導出部１２による各出力モデル２２の評価値の導出処理、及び学習部１４による出力モデル２２群の学習処理は、評価値が収束するまで行ってもよい。 The derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 are performed for a predetermined number of generations (for example, 10,000 generations). Then, the learning unit 14 stores in the storage unit 42 one output model 22 having the best evaluation indicated by the evaluation value in the final generation as an output model 22A used in the operation phase described later. The derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 may be performed until the evaluation values converge.

次に、図９を参照して、本実施形態に係る運用フェーズにおける学習装置１０の機能的な構成について説明する。図９に示すように、学習装置１０は、受付部３０及び出力部３２を備える。また、学習装置１０の記憶部４２には、前述した学習フェーズで得られた出力モデル２２Ａが記憶される。 Next, with reference to FIG. 9, the functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described. As shown in FIG. 9, the learning device 10 includes a reception unit 30 and an output unit 32. Further, the storage unit 42 of the learning device 10 stores the output model 22A obtained in the learning phase described above.

受付部３０は、ユーザにより入力部４４（図１０参照）を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。 The reception unit 30 receives a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 (see FIG. 10) and the performance value of the material of the experimental result.

出力部３２は、受付部３０により受け付けられた実験条件と性能値との複数の組み合わせを出力モデル２２Ａに入力し、出力モデル２２Ａから出力された実験条件を取得する。また、出力部３２は、学習フェーズにおける導出部１２と同様に、出力モデル２２Ａから出力された実験条件を実際の実験に使用可能な実験条件に補正する。そして、出力部３２は、補正して得られた実験条件を表示部４３（図１０参照）に出力する。ユーザは、表示部４３に表示された実験条件を目視し、必要に応じてその実験条件での実験を行う。なお、出力部３２は、補正して得られた実験条件を記憶部４２に出力（記憶）してもよい。 The output unit 32 inputs a plurality of combinations of the experimental conditions and the performance values received by the reception unit 30 into the output model 22A, and acquires the experimental conditions output from the output model 22A. Further, the output unit 32 corrects the experimental conditions output from the output model 22A to the experimental conditions that can be used in the actual experiment, similarly to the derivation unit 12 in the learning phase. Then, the output unit 32 outputs the corrected experimental conditions to the display unit 43 (see FIG. 10). The user visually observes the experimental conditions displayed on the display unit 43, and if necessary, conducts an experiment under the experimental conditions. The output unit 32 may output (store) the experimental conditions obtained by correction to the storage unit 42.

次に、図１０を参照して、学習装置１０のハードウェア構成について説明する。学習装置１０は、図１０に示すコンピュータによって実現される。図１０に示すように、学習装置１０は、ＣＰＵ（Central Processing Unit）４０、一時記憶領域としてのメモリ４１、及び不揮発性の記憶部４２を備える。また、学習装置１０は、液晶ディスプレイ等の表示部４３、及びキーボードとマウス等の入力部４４を備える。ＣＰＵ４０、メモリ４１、記憶部４２、表示部４３、及び入力部４４は、バス４５を介して接続される。 Next, the hardware configuration of the learning device 10 will be described with reference to FIG. The learning device 10 is realized by the computer shown in FIG. As shown in FIG. 10, the learning device 10 includes a CPU (Central Processing Unit) 40, a memory 41 as a temporary storage area, and a non-volatile storage unit 42. Further, the learning device 10 includes a display unit 43 such as a liquid crystal display and an input unit 44 such as a keyboard and a mouse. The CPU 40, the memory 41, the storage unit 42, the display unit 43, and the input unit 44 are connected via the bus 45.

記憶部４２は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、及びフラッシュメモリ等によって実現される。記憶媒体としての記憶部４２には、学習プログラム５０が記憶される。ＣＰＵ４０は、学習プログラム５０を記憶部４２から読み出し、読み出した学習プログラム５０をメモリ４１に展開してから実行する。ＣＰＵ４０が学習プログラム５０を実行することによって、導出部１２、学習部１４、受付部３０、及び出力部３２として機能する。 The storage unit 42 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. The learning program 50 is stored in the storage unit 42 as a storage medium. The CPU 40 reads the learning program 50 from the storage unit 42, expands the read learning program 50 into the memory 41, and then executes the learning program 50. When the CPU 40 executes the learning program 50, it functions as a derivation unit 12, a learning unit 14, a reception unit 30, and an output unit 32.

次に、図１１〜図１３を参照して、本実施形態に係る学習装置１０の作用を説明する。学習装置１０が学習プログラム５０を実行することにより、図１１に示す実験モデル学習処理、図１２に示す出力モデル学習処理、及び図１３に示す実験条件出力処理が実行される。図１１に示す実験モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して実験モデル学習処理の実行指示が入力された場合に実行される。また、図１２に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図１３に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部４４を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning device 10 according to the present embodiment will be described with reference to FIGS. 11 to 13. When the learning device 10 executes the learning program 50, the experimental model learning process shown in FIG. 11, the output model learning process shown in FIG. 12, and the experimental condition output process shown in FIG. 13 are executed. The experimental model learning process shown in FIG. 11 is executed, for example, when an execution instruction for the experimental model learning process is input by the user via the input unit 44 in the learning phase. Further, the output model learning process shown in FIG. 12 is executed, for example, when an execution instruction for the output model learning process is input by the user via the input unit 44 in the learning phase. Further, the experimental condition output process shown in FIG. 13 is executed, for example, when an execution instruction of the experimental condition output process is input by the user via the input unit 44 in the operation phase.

図１１のステップＳ１０で、学習部１４は、記憶部４２から学習用データ２０を読み出す。ステップＳ１２で、学習部１４は、それぞれモデルの作成条件が異なる複数の実験モデル２４を生成する。ステップＳ１４で、学習部１４は、ステップＳ１２の処理により生成された複数の実験モデル２４の中から、学習させる対象の１つの実験モデル２４を選択する。なお、ステップＳ１４の処理が繰り返し実行される際には、学習部１４は、それまでに未選択の実験モデル２４を選択する。 In step S10 of FIG. 11, the learning unit 14 reads the learning data 20 from the storage unit 42. In step S12, the learning unit 14 generates a plurality of experimental models 24 having different model creation conditions. In step S14, the learning unit 14 selects one experimental model 24 to be trained from the plurality of experimental models 24 generated by the process of step S12. When the process of step S14 is repeatedly executed, the learning unit 14 selects the experimental model 24 that has not been selected by then.

ステップＳ１６で、学習部１４は、前述したように、ステップＳ１０の処理により読み出された学習用データ２０を用いて、ステップＳ１４の処理により選択された実験モデル２４を誤差逆伝播法によって学習させる。ステップＳ１８で、学習部１４は、ステップＳ１６の処理により学習された実験モデル２４を記憶部４２に記憶する。ステップＳ２０で、学習部１４は、ステップＳ１２の処理により生成された全ての実験モデル２４について、ステップＳ１４〜ステップＳ１８の処理が完了したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ１４に戻り、肯定判定となった場合は、実験モデル学習処理が終了する。 In step S16, as described above, the learning unit 14 trains the experimental model 24 selected by the process of step S14 by the error backpropagation method using the learning data 20 read by the process of step S10. .. In step S18, the learning unit 14 stores the experimental model 24 learned by the process of step S16 in the storage unit 42. In step S20, the learning unit 14 determines whether or not the processes of steps S14 to S18 have been completed for all the experimental models 24 generated by the process of step S12. If this determination is a negative determination, the process returns to step S14, and if the determination is affirmative, the experimental model learning process ends.

図１２のステップＳ３０で、学習部１４は、それぞれモデルの作成条件が異なる複数の出力モデル２２を生成する。ステップＳ３２で、導出部１２は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを各出力モデル２２に入力し、各出力モデル２２から出力された実験条件をそれぞれ取得する。 In step S30 of FIG. 12, the learning unit 14 generates a plurality of output models 22 having different model creation conditions. In step S32, the derivation unit 12 inputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results into each output model 22, and acquires the experimental conditions output from each output model 22. To do.

なお、この実験条件と性能値との複数の組み合わせは、ステップＳ３２が出力モデル２２の各世代の初回に実行される際（すなわち、初回にステップＳ３２が実行される際、又は後述するステップＳ４６の判定が否定判定となった後の初回にステップＳ３２が実行される際）には、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップＳ３２が出力モデル２２の各世代の２回目以降に実行される際（すなわち、ステップＳ４０の判定が否定判定となった後にステップＳ３２が実行される際）には、前回のステップＳ３２で出力モデル２２に入力された実験条件と性能値との複数の組み合わせに、後述するステップＳ３８で実験条件と性能値との組み合わせが追加されたものとなる。 The plurality of combinations of the experimental conditions and the performance values are used when step S32 is executed for the first time of each generation of the output model 22 (that is, when step S32 is executed for the first time, or in step S46 described later). When step S32 is executed for the first time after the determination is a negative determination), all the experimental conditions and performance values included in the learning data 20 are combined. Further, the plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed after the second time of each generation of the output model 22 (that is, after the determination in step S40 is a negative determination, step S32 is performed. (When executed), the combination of the experimental condition and the performance value is added to the plurality of combinations of the experimental condition and the performance value input to the output model 22 in the previous step S32 in the step S38 described later. It becomes.

ステップＳ３４で、導出部１２は、前述したように、ステップＳ３２の処理により各出力モデル２２から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。ステップＳ３６で、導出部１２は、ステップＳ３４の処理により補正されて得られた各実験条件を、各実験モデル２４に入力し、各実験モデル２４から出力された性能値をそれぞれ取得する。また、導出部１２は、各出力モデル２２について、出力モデル２２から出力された実験条件に対応して、実験条件と性能値との複数の組み合わせをそれぞれ保持する。 In step S34, as described above, the derivation unit 12 corrects the experimental conditions output from each output model 22 by the process of step S32 to the experimental conditions that can be used in the actual experiment. In step S36, the derivation unit 12 inputs each experimental condition obtained by being corrected by the process of step S34 into each experimental model 24, and acquires the performance value output from each experimental model 24, respectively. Further, the derivation unit 12 holds a plurality of combinations of the experimental conditions and the performance values for each output model 22 corresponding to the experimental conditions output from the output model 22.

ステップＳ３８で、導出部１２は、今回（直前）のステップＳ３２の処理により出力モデル２２に入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部１２は、今回のステップＳ３６の処理により実験モデル２４に入力した実験条件と、性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップＳ４０の判定が否定判定となった後に、次に実行されるステップＳ３２で用いられる。 In step S38, the derivation unit 12 adds the following combinations of experimental conditions and performance values to a plurality of combinations of experimental conditions and performance values input to the output model 22 by the processing of step S32 this time (immediately before). to add. That is, in this case, the derivation unit 12 adds a combination of the experimental conditions input to the experimental model 24 and the performance value by the process of step S36 this time. The plurality of combinations of the experimental conditions and the performance values obtained by performing this addition are used in the next step S32 to be executed after the determination in the step S40 described later becomes a negative determination.

ステップＳ４０で、導出部１２は、ステップＳ３２〜ステップＳ３８の処理を、所定の回数（例えば、１００回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ３２に戻り、肯定判定となった場合は、処理はステップＳ４２に移行する。 In step S40, the out-licensing unit 12 determines whether or not the processes of steps S32 to S38 have been repeated a predetermined number of times (for example, 100 times). If this determination is a negative determination, the process returns to step S32, and if the determination is affirmative, the process proceeds to step S42.

ステップＳ４２で、導出部１２は、前述したように、各出力モデル２２について、ステップＳ３２〜ステップＳ３８の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル２２の評価値を導出する。ステップＳ４４で、学習部１４は、前述したように、各出力モデル２２についてステップＳ４２の処理により導出された評価値を用いて、遺伝的アルゴリズムによって次世代の出力モデル２２を生成する。この次世代の出力モデル２２は、後述するステップＳ４６の判定が否定判定となった後に、次に実行されるステップＳ３２で用いられる。 In step S42, as described above, the derivation unit 12 derives the evaluation value of the output model 22 for each output model 22 by using the performance values for the predetermined number of times obtained by the iterative processing of steps S32 to S38. To do. In step S44, as described above, the learning unit 14 generates the next-generation output model 22 by the genetic algorithm using the evaluation values derived by the processing of step S42 for each output model 22. This next-generation output model 22 is used in step S32, which is executed next after the determination in step S46, which will be described later, becomes a negative determination.

ステップＳ４６で、学習部１４は、出力モデル２２の世代数が所定の世代数（例えば、１万世代）に達したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ３２に戻り、肯定判定となった場合は、処理はステップＳ４８に移行する。ステップＳ４８で、学習部１４は、前述したように、最終世代において評価値が示す評価が最も良い１つの出力モデル２２を出力モデル２２Ａとして記憶部４２に記憶する。ステップＳ４８の処理が終了すると、出力モデル学習処理が終了する。 In step S46, the learning unit 14 determines whether or not the number of generations of the output model 22 has reached a predetermined number of generations (for example, 10,000 generations). If this determination is a negative determination, the process returns to step S32, and if the determination is affirmative, the process proceeds to step S48. In step S48, as described above, the learning unit 14 stores in the storage unit 42 as the output model 22A one output model 22 having the best evaluation value indicated by the evaluation value in the final generation. When the process of step S48 is completed, the output model learning process is completed.

図１３のステップＳ５０で、受付部３０は、ユーザにより入力部４４を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップＳ５２で、出力部３２は、記憶部４２から出力モデル２２Ａを読み出す。ステップＳ５４で、出力部３２は、ステップＳ５０の処理により受け付けられた実験条件と性能値との複数の組み合わせを、ステップＳ５２の処理により読み出された出力モデル２２Ａに入力し、出力モデル２２Ａから出力された実験条件を取得する。 In step S50 of FIG. 13, the reception unit 30 receives a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S52, the output unit 32 reads the output model 22A from the storage unit 42. In step S54, the output unit 32 inputs a plurality of combinations of the experimental conditions and the performance values received by the process of step S50 into the output model 22A read by the process of step S52, and outputs from the output model 22A. Obtain the experimental conditions.

ステップＳ５６で、出力部３２は、前述したように、ステップＳ５４の処理により出力モデル２２Ａから出力された実験条件を実際の実験に使用可能な実験条件に補正する。ステップＳ５８で、出力部３２は、前述したように、ステップＳ５６の処理により補正された実験条件を表示部４３に出力する。ステップＳ５８の処理により、表示部４３には実験条件が表示される。ステップＳ５８の処理が終了すると、実験条件出力処理が終了する。 In step S56, as described above, the output unit 32 corrects the experimental conditions output from the output model 22A by the process of step S54 to the experimental conditions that can be used in the actual experiment. In step S58, the output unit 32 outputs the experimental conditions corrected by the process of step S56 to the display unit 43 as described above. By the process of step S58, the experimental conditions are displayed on the display unit 43. When the process of step S58 is completed, the experimental condition output process is completed.

以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデル２２により得られた実験条件を、仮想的な実験を行う実験モデル２４に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル２２の評価値を導出する。そして、導出した出力モデル２２の評価値を用いて出力モデル２２を機械学習によって学習させる。従って、このように学習された出力モデル２２を用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, the experiment obtained by the output model 22 in which a plurality of combinations of the experimental conditions for producing the material and the performance values of the experimental results are input and the experimental conditions are output. The conditions are input to the experimental model 24 for performing a virtual experiment. Further, the evaluation value of the output model 22 is derived using the performance value of the experimental result obtained by this input. Then, the output model 22 is trained by machine learning using the derived evaluation values of the output model 22. Therefore, by using the output model 22 learned in this way, it is possible to search for appropriate experimental conditions for the material.

［第２実施形態］
開示の技術の第２実施形態を説明する。なお、第１実施形態と同一の構成要素については、同一の符号を付して説明を省略する。まず、図１４を参照して、本実施形態に係る学習フェーズにおける学習装置１０の機能的な構成について説明する。図１４に示すように、学習装置１０は、導出部１２Ａ、学習部１４Ａ、及び生成部１６を備える。記憶部４２には、学習用データ２０、複数の出力モデル２２Ｂ、及び複数の実験モデル２４が記憶される。[Second Embodiment]
A second embodiment of the disclosed technique will be described. The same components as those in the first embodiment are designated by the same reference numerals, and the description thereof will be omitted. First, with reference to FIG. 14, the functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described. As shown in FIG. 14, the learning device 10 includes a derivation unit 12A, a learning unit 14A, and a generation unit 16. The storage unit 42 stores the learning data 20, the plurality of output models 22B, and the plurality of experimental models 24.

図１５に、出力モデル２２Ｂの一例を示す。図１５に示すように、本実施形態に係る出力モデル２２Ｂは、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。出力モデル２２Ｂの入力層には、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び１つの実験条件の候補が入力される。出力モデル２２Ｂの出力層は、強化学習における行動価値の一例としてのＱ値を出力する。すなわち、本実施形態に係る学習装置１０は、実験条件と性能値との複数の組み合わせを現状態ｓとし、実験条件の候補を行動ａとして、強化学習の一例としてのＱ学習に従って出力モデル２２Ｂを学習させる。なお、本実施形態に係る複数の出力モデル２２Ｂも、第１実施形態に係る出力モデル２２と同様に、それぞれモデルの作成条件が異なる。 FIG. 15 shows an example of the output model 22B. As shown in FIG. 15, the output model 22B according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. In the input layer of the output model 22B, a plurality of combinations of experimental conditions for producing a material and performance values of experimental results, and a candidate for one experimental condition are input. The output layer of the output model 22B outputs a Q value as an example of action value in reinforcement learning. That is, the learning device 10 according to the present embodiment sets the output model 22B according to Q-learning as an example of reinforcement learning, with a plurality of combinations of experimental conditions and performance values as the current state s and candidates for experimental conditions as action a. Let them learn. The plurality of output models 22B according to the present embodiment also have different model creation conditions, as with the output model 22 according to the first embodiment.

生成部１６は、複数の異なる実験条件の候補を生成する。本実施形態では、生成部１６は、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補を生成する。この規則及び実際の実験に使用可能な実験条件については、第１実施形態と同様であるため、説明を省略する。具体的には、生成部１６は、複数の異なる実験条件の候補を生成する都度、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補をランダムに生成する。 The generation unit 16 generates candidates for a plurality of different experimental conditions. In the present embodiment, the generation unit 16 generates candidates for experimental conditions that satisfy predetermined rules and can be used in an actual experiment. Since this rule and the experimental conditions that can be used in the actual experiment are the same as those in the first embodiment, the description thereof will be omitted. Specifically, the generation unit 16 randomly generates candidates for experimental conditions that satisfy predetermined rules and can be used in actual experiments each time a plurality of candidates for different experimental conditions are generated.

導出部１２Ａは、後述する学習部１４Ａが各出力モデル２２ＢをＱ学習に従って学習させる際に報酬として用いる値（以下、「報酬値」という）を導出する。以下、導出部１２Ａが報酬値を導出する処理の詳細を説明する。 The derivation unit 12A derives a value (hereinafter, referred to as “reward value”) used as a reward when the learning unit 14A, which will be described later, trains each output model 22B according to Q-learning. Hereinafter, the details of the process in which the out-licensing unit 12A derives the reward value will be described.

まず、導出部１２Ａは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部１６により生成された実験条件の候補を出力モデル２２Ｂに入力し、出力モデル２２Ｂから出力されたＱ値を取得する。詳細には、図１６に示すように、導出部１２Ａは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｂに個別に入力する。すなわち、導出部１２Ａは、生成部１６により生成された複数の実験条件の候補のそれぞれに対応して出力モデル２２Ｂから出力されたＱ値を取得する。 First, the derivation unit 12A inputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results, and the candidates for the experimental conditions generated by the generation unit 16 into the output model 22B, and the output model 22B. The Q value output from is acquired. In detail, as shown in FIG. 16, the derivation unit 12A is a plurality of combinations of experimental conditions for producing a material and performance values of experimental results, and candidates for a plurality of experimental conditions generated by the generation unit 16. Any one of the above is individually input to the output model 22B for all the generated experimental condition candidates. That is, the derivation unit 12A acquires the Q value output from the output model 22B corresponding to each of the candidates for the plurality of experimental conditions generated by the generation unit 16.

次に、導出部１２Ａは、取得した複数のＱ値のうち、所定値以上のＱ値の何れかに対応する実験条件の候補を実験モデル２４に入力する。本実施形態では、導出部１２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を各実験モデル２４に入力し、各実験モデル２４から出力された性能値を取得する。また、導出部１２Ａは、第１実施形態に係る導出部１２と同様に、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 Next, the derivation unit 12A inputs to the experimental model 24 a candidate for the experimental condition corresponding to any of the acquired Q values equal to or higher than the predetermined value. In the present embodiment, the derivation unit 12A inputs the candidate of the experimental condition corresponding to the maximum Q value among the acquired plurality of Q values into each experimental model 24, and inputs the performance value output from each experimental model 24. get. Further, the out-licensing unit 12A holds a plurality of combinations of experimental conditions and performance values of experimental results, respectively, as in the out-licensing unit 12 according to the first embodiment.

更に、導出部１２Ａは、出力モデル２２Ｂに入力した実験条件と性能値との複数の組み合わせに、実験モデル２４に入力した実験条件と導出した性能値との組み合わせを追加した実験条件と性能値との複数の組み合わせを得る。また、導出部１２Ａは、再度、得られた実験条件と性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｂに個別に入力する。そして、導出部１２Ａは、再度、前述した処理と同様に、複数の実験条件の候補のそれぞれに対応して出力モデル２２Ｂから出力されたＱ値と実験モデル２４とを用いて、実験条件の候補に対応する性能値を取得する。導出部１２Ａは、この実験条件の候補に対応する性能値を取得するための処理を所定の回数（例えば、１００回）繰り返す。 Further, the derivation unit 12A adds the combination of the experimental condition input to the experimental model 24 and the derived performance value to the plurality of combinations of the experimental condition and the performance value input to the output model 22B, and the experimental condition and the performance value. Get multiple combinations of. In addition, the out-licensing unit 12A again used a plurality of combinations of the obtained experimental conditions and performance values, and any one of the candidates for the plurality of experimental conditions generated by the generation unit 16 in all the experiments generated. The condition candidates are individually input to the output model 22B. Then, the derivation unit 12A again uses the Q value output from the output model 22B and the experimental model 24 corresponding to each of the candidates for the plurality of experimental conditions, as in the above-described processing, to be candidates for the experimental conditions. Acquire the performance value corresponding to. The derivation unit 12A repeats the process for acquiring the performance value corresponding to the candidate of the experimental condition a predetermined number of times (for example, 100 times).

また、導出部１２Ａは、以上の処理を各出力モデル２２Ｂについて行う。すなわち、導出部１２Ａは、各出力モデル２２Ｂのそれぞれについて所定回数分の性能値を取得する。導出部１２Ａは、第１実施形態に係る導出部１２と同様に、各出力モデル２２Ｂについて、得られた所定回数分の性能値を用いて出力モデル２２Ｂの評価値を導出する（図７参照）。 Further, the out-licensing unit 12A performs the above processing for each output model 22B. That is, the out-licensing unit 12A acquires the performance values for the predetermined number of times for each of the output models 22B. Similar to the derivation unit 12 according to the first embodiment, the derivation unit 12A derives the evaluation value of the output model 22B for each output model 22B using the obtained performance values for a predetermined number of times (see FIG. 7). ..

また、導出部１２Ａは、導出した評価値が高い出力モデル２２Ｂほど高い報酬が得られるように報酬値を導出する。例えば、導出部１２Ａは、評価値が高い順番に上位３つの出力モデル２２Ｂの報酬値を「１」と導出し、下位３つの出力モデル２２Ｂの報酬値を「−１」と導出し、他の出力モデル２２Ｂの報酬値を「０」と導出する。 Further, the derivation unit 12A derives the reward value so that the higher the output model 22B, the higher the reward is obtained. For example, the derivation unit 12A derives the reward values of the upper three output models 22B as "1" in descending order of the evaluation value, derives the reward values of the lower three output models 22B as "-1", and derives the reward values of the other three output models 22B. The reward value of the output model 22B is derived as "0".

学習部１４Ａは、導出部１２Ａにより導出された報酬値をＱ学習における報酬ｒとして用いて、各出力モデル２２Ｂを学習させる。 The learning unit 14A trains each output model 22B by using the reward value derived by the derivation unit 12A as the reward r in Q-learning.

上記の導出部１２Ａによる各出力モデル２２Ｂの報酬値を導出するための処理、及び学習部１４Ａによる各出力モデル２２Ｂの学習処理は、所定の回数（例えば、１万回）だけ行われる。そして、学習部１４Ａは、最後の回において、評価値が示す評価が最も良い１つの出力モデル２２Ｂを、後述する運用フェーズで用いる出力モデル２２Ｃとして記憶部４２に記憶する。なお、上記の導出部１２Ａによる各出力モデル２２Ｂの報酬値を導出するための処理、及び学習部１４Ａによる各出力モデル２２Ｂの学習処理は、評価値が収束するまで行ってもよい。 The process for deriving the reward value of each output model 22B by the derivation unit 12A and the learning process of each output model 22B by the learning unit 14A are performed a predetermined number of times (for example, 10,000 times). Then, in the final round, the learning unit 14A stores one output model 22B having the best evaluation indicated by the evaluation value in the storage unit 42 as an output model 22C used in the operation phase described later. The process for deriving the reward value of each output model 22B by the derivation unit 12A and the learning process of each output model 22B by the learning unit 14A may be performed until the evaluation values converge.

また、学習部１４Ａは、第１実施形態に係る学習部１４と同様に、学習用データ２０を用いて、誤差逆伝播法に従って、実験モデル２４を学習させる。 Further, the learning unit 14A trains the experimental model 24 according to the error back propagation method using the learning data 20 as in the learning unit 14 according to the first embodiment.

次に、図１７を参照して、本実施形態に係る運用フェーズにおける学習装置１０の機能的な構成について説明する。図１７に示すように、学習装置１０は、生成部１６、受付部３０、及び出力部３２Ａを備える。また、学習装置１０の記憶部４２には、前述した学習フェーズで得られた出力モデル２２Ｃが記憶される。 Next, with reference to FIG. 17, the functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described. As shown in FIG. 17, the learning device 10 includes a generation unit 16, a reception unit 30, and an output unit 32A. Further, the storage unit 42 of the learning device 10 stores the output model 22C obtained in the learning phase described above.

出力部３２Ａは、受付部３０により受け付けられた実験条件と性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｃに個別に入力する。出力部３２Ａは、この入力それぞれに対応して出力モデル２２Ｃから出力されたＱ値を取得する。そして、出力部３２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力する。なお、出力部３２Ａは、取得した複数のＱ値のうち、所定値以上のＱ値の何れか（例えば、所定値以上で、かつ２番目に大きいＱ値）に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力してもよい。また、出力部３２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として記憶部４２に出力（記憶）してもよい。 The output unit 32A is used to generate all the experiments in which any one of a plurality of combinations of the experimental conditions and the performance values received by the reception unit 30 and a plurality of experimental condition candidates generated by the generation unit 16 is generated. The candidate conditions are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs. Then, the output unit 32A outputs the candidate of the experimental condition corresponding to the maximum Q value among the acquired plurality of Q values to the display unit 43 as a candidate of the experimental condition to be the next experiment target. It should be noted that the output unit 32A selects candidates for experimental conditions corresponding to any of the acquired Q values of the predetermined value or more (for example, the Q value of the predetermined value or more and the second largest Q value). Next, it may be output to the display unit 43 as a candidate for the experimental conditions to be tested. Further, the output unit 32A outputs (stores) a candidate for the experimental condition corresponding to the maximum Q value among the acquired plurality of Q values to the storage unit 42 as a candidate for the experimental condition to be the next experiment target. May be good.

本実施形態に係る学習装置１０のハードウェア構成は、第１実施形態に係る学習装置１０と同様（図１０参照）であるため、説明を省略する。ＣＰＵ４０が学習プログラム５０を実行することによって、導出部１２Ａ、学習部１４Ａ、生成部１６、受付部３０、及び出力部３２Ａとして機能する。 Since the hardware configuration of the learning device 10 according to the present embodiment is the same as that of the learning device 10 according to the first embodiment (see FIG. 10), the description thereof will be omitted. When the CPU 40 executes the learning program 50, it functions as a derivation unit 12A, a learning unit 14A, a generation unit 16, a reception unit 30, and an output unit 32A.

次に、図１８及び図１９を参照して、本実施形態に係る学習装置１０の作用を説明する。なお、実験モデル学習処理は、第１実施形態と同様（図１１参照）であるため、説明を省略する。図１８に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図１９に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部４４を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning device 10 according to the present embodiment will be described with reference to FIGS. 18 and 19. Since the experimental model learning process is the same as that of the first embodiment (see FIG. 11), the description thereof will be omitted. The output model learning process shown in FIG. 18 is executed, for example, when an execution instruction for the output model learning process is input by the user via the input unit 44 in the learning phase. Further, the experimental condition output process shown in FIG. 19 is executed, for example, when an execution instruction of the experimental condition output process is input by the user via the input unit 44 in the operation phase.

図１８のステップＳ６０で、学習部１４は、それぞれモデルの作成条件が異なる複数の出力モデル２２Ｂを生成する。ステップＳ６０の処理により生成された各出力モデル２２Ｂについて以下のステップＳ６２〜Ｓ７０の処理が同様に実行される。ステップＳ６２で、生成部１６は、前述したように、複数の異なる実験条件の候補を生成する。 In step S60 of FIG. 18, the learning unit 14 generates a plurality of output models 22B having different model creation conditions. The following processes of steps S62 to S70 are similarly executed for each output model 22B generated by the process of step S60. In step S62, the generation unit 16 generates a plurality of candidates for different experimental conditions as described above.

ステップＳ６４で、導出部１２Ａは、前述したように、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及びステップＳ６２の処理により生成された実験条件の候補を出力モデル２２Ｂに入力し、出力モデル２２Ｂから出力されたＱ値を取得する。 In step S64, as described above, the derivation unit 12A outputs a plurality of combinations of the experimental conditions for producing the material and the performance values of the experimental results, and the candidates of the experimental conditions generated by the process of step S62 as an output model. Input to 22B and acquire the Q value output from the output model 22B.

なお、この実験条件と性能値との複数の組み合わせは、ステップＳ６４が出力モデル２２Ｂの学習処理における初回に実行される際（すなわち、初回にステップＳ６４が実行される際、又は後述するステップＳ７８の判定が否定判定となった後の初回にステップＳ６２が実行される際）には、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップＳ６４が出力モデル２２Ｂの学習処理における２回目以降に実行される際（すなわち、ステップＳ７０の判定が否定判定となった後にステップＳ６４が実行される際）には、前回のステップＳ６４で出力モデル２２Ｂに入力された実験条件と性能値との複数の組み合わせに、後述するステップＳ６８で実験条件と性能値との組み合わせが追加されたものとなる。 It should be noted that the plurality of combinations of the experimental conditions and the performance values are used when step S64 is executed for the first time in the learning process of the output model 22B (that is, when step S64 is executed for the first time, or in step S78 described later). When step S62 is executed for the first time after the determination is a negative determination), all the experimental conditions and performance values included in the learning data 20 are combined. Further, the plurality of combinations of the experimental conditions and the performance values are obtained when the step S64 is executed after the second time in the learning process of the output model 22B (that is, after the determination in the step S70 is a negative determination, the step S64 is performed. When executed), the combination of the experimental condition and the performance value is added to the plurality of combinations of the experimental condition and the performance value input to the output model 22B in the previous step S64 in the step S68 described later. It becomes.

ステップＳ６６で、導出部１２Ａは、ステップＳ６４の処理により取得された複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を各実験モデル２４に入力し、各実験モデル２４から出力された性能値を取得する。また、導出部１２Ａは、最大のＱ値に対応する実験条件の候補に対応して、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 In step S66, the derivation unit 12A inputs a candidate for the experimental condition corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S64 into each experimental model 24, and outputs the candidate from each experimental model 24. Get the performance value. Further, the derivation unit 12A holds a plurality of combinations of the experimental conditions and the performance values of the experimental results, respectively, corresponding to the candidates of the experimental conditions corresponding to the maximum Q value.

ステップＳ６８で、導出部１２Ａは、今回（直前）のステップＳ６４の処理により出力モデル２２Ｂに入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部１２Ａは、今回のステップＳ６６の処理により実験モデル２４に入力した実験条件と、取得された性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップＳ７０の判定が否定判定となった後に、次に実行されるステップＳ６４で用いられる。 In step S68, the derivation unit 12A adds the following combinations of experimental conditions and performance values to a plurality of combinations of experimental conditions and performance values input to the output model 22B by the processing of step S64 this time (immediately before). to add. That is, in this case, the derivation unit 12A adds a combination of the experimental conditions input to the experimental model 24 by the process of step S66 this time and the acquired performance values. The plurality of combinations of the experimental conditions and the performance values obtained by performing this addition are used in the next step S64 to be executed after the determination in the step S70 described later becomes a negative determination.

ステップＳ７０で、導出部１２Ａは、ステップＳ６２〜ステップＳ６８の処理を、所定の回数（例えば、１００回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ６２に戻り、肯定判定となった場合は、処理はステップＳ７２に移行する。 In step S70, the out-licensing unit 12A determines whether or not the processes of steps S62 to S68 have been repeated a predetermined number of times (for example, 100 times). If this determination is a negative determination, the process returns to step S62, and if the determination is affirmative, the process proceeds to step S72.

ステップＳ７２で、導出部１２Ａは、前述したように、各出力モデル２２Ｂについて、ステップＳ６２〜ステップＳ６８の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル２２Ｂの評価値を導出する。ステップＳ７４で、導出部１２Ａは、前述したように、ステップＳ７２の処理により導出された評価値が高い出力モデル２２Ｂほど高い報酬が得られるように報酬値を導出する。 In step S72, as described above, the derivation unit 12A derives the evaluation value of the output model 22B for each output model 22B by using the performance values for the predetermined number of times obtained by the iterative processing of steps S62 to S68. To do. In step S74, as described above, the derivation unit 12A derives the reward value so that the higher the output model 22B derived by the process of step S72, the higher the reward.

ステップＳ７６で、学習部１４Ａは、ステップＳ７４の処理により導出された報酬値をＱ学習における報酬ｒとして用いて、各出力モデル２２Ｂを学習させる。ステップＳ７８で、学習部１４は、ステップＳ６２〜ステップＳ７６の処理を、所定の回数（例えば、１万回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ６２に戻り、肯定判定となった場合は、処理はステップＳ８０に移行する。ステップＳ８０で、学習部１４Ａは、前述したように、最後に実行されたステップＳ７２の処理により導出された評価値が示す評価が最も良い１つの出力モデル２２Ｂを出力モデル２２Ｃとして記憶部４２に記憶する。ステップＳ８０の処理が終了すると、出力モデル学習処理が終了する。 In step S76, the learning unit 14A trains each output model 22B by using the reward value derived by the process of step S74 as the reward r in Q-learning. In step S78, the learning unit 14 determines whether or not the processes of steps S62 to S76 have been repeated a predetermined number of times (for example, 10,000 times). If this determination is a negative determination, the process returns to step S62, and if the determination is affirmative, the process proceeds to step S80. In step S80, as described above, the learning unit 14A stores in the storage unit 42 as the output model 22C one output model 22B having the best evaluation indicated by the evaluation value derived by the processing of the last executed step S72. To do. When the process of step S80 is completed, the output model learning process is completed.

図１９のステップＳ９０で、受付部３０は、ユーザにより入力部４４を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップＳ９２で、出力部３２Ａは、記憶部４２から出力モデル２２Ｃを読み出す。ステップＳ９４で、生成部１６は、前述したように、複数の異なる実験条件の候補を生成する。 In step S90 of FIG. 19, the reception unit 30 receives a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S92, the output unit 32A reads the output model 22C from the storage unit 42. In step S94, the generation unit 16 generates a plurality of candidates for different experimental conditions, as described above.

ステップＳ９６で、出力部３２Ａは、ステップＳ９０の処理により受け付けられた実験条件と性能値との複数の組み合わせ、及びステップＳ９２の処理により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｃに個別に入力する。出力部３２Ａは、この入力それぞれに対応して出力モデル２２Ｃから出力されたＱ値を取得する。 In step S96, the output unit 32A selects any one of a plurality of combinations of the experimental conditions and the performance values received by the process of step S90 and a plurality of candidates for the experimental conditions generated by the process of step S92. All the generated experimental condition candidates are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs.

ステップＳ９８で、出力部３２Ａは、ステップＳ９６の処理により取得された複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力する。ステップＳ９８の処理が終了すると、実験条件出力処理が終了する。 In step S98, the output unit 32A displays a candidate for the experimental condition corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S96 as a candidate for the experimental condition to be the next experiment target. Output to 43. When the process of step S98 is completed, the experimental condition output process is completed.

以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、Ｑ値を出力とした出力モデル２２Ｂにより得られたＱ値が最大となる実験条件の候補を実験モデル２４に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル２２Ｂの評価値を導出し、導出した評価値に応じて出力モデル２２Ｂに与える報酬を導出する。そして、導出した報酬を用いて出力モデル２２ＢをＱ学習によって学習させる。従って、このように学習された出力モデル２２Ｂを用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, an output model in which a plurality of combinations of experimental conditions for producing a material and performance values of experimental results and candidates for experimental conditions are input and a Q value is output. The candidate for the experimental condition that maximizes the Q value obtained by 22B is input to the experimental model 24. Further, the evaluation value of the output model 22B is derived using the performance value of the experimental result obtained by this input, and the reward given to the output model 22B is derived according to the derived evaluation value. Then, the output model 22B is trained by Q-learning using the derived reward. Therefore, by using the output model 22B learned in this way, it is possible to search for appropriate experimental conditions for the material.

なお、上記各実施形態では、材料を生成するための実験条件を導出する場合について説明したが、これに限定されない。例えば、薬剤を生成するための実験条件を導出する形態としてもよい。 In each of the above embodiments, the case of deriving the experimental conditions for producing the material has been described, but the present invention is not limited to this. For example, it may be in the form of deriving experimental conditions for producing a drug.

また、上記各実施形態では、実験モデル２４として機械学習によって得られた学習済みモデルを適用した場合について説明したが、仮想的な実験が可能なモデルであれば、これに限定されない。例えば、実験モデル２４として、１つの実験条件を入力とし、入力された１つの実験条件に対応する実験結果の性能値を出力とした任意の関数を適用してもよい。このようなモデルを適用した場合でも出力モデル２２、２２Ｂが学習されることによって最適化される。また、例えば、実験モデル２４は、実験をシミュレーションするシミュレータであってもよい。 Further, in each of the above embodiments, the case where the trained model obtained by machine learning is applied as the experimental model 24 has been described, but the model is not limited to this as long as it is a model capable of virtual experiments. For example, as the experimental model 24, an arbitrary function may be applied in which one experimental condition is input and the performance value of the experimental result corresponding to the input experimental condition is output. Even when such a model is applied, it is optimized by learning the output models 22 and 22B. Further, for example, the experimental model 24 may be a simulator that simulates an experiment.

また、上記第２実施形態の運用フェーズにおいて、出力部３２Ａは、複数の実験条件の候補を出力モデル２２Ｃに逐次的に複数回入力することにより得られた累計のＱ値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力してもよい。この場合、出力部３２Ａは、まず、第２実施形態と同様に、出力モデル２２Ｃから１回目の複数の実験条件の候補それぞれに対応するＱ値を得る。次に、出力部３２Ａは、例えば、１回目に出力モデル２２Ｃに入力した実験条件と性能値との複数の組み合わせに、１回目に出力モデル２２Ｃに入力した実験条件の候補と性能値との組み合わせを追加する。この性能値は、例えば、ＳＶＭ（Support Vector Machine）等の既知の手法により推定すればよい。そして、出力部３２Ａは、１回目と同様に、追加して得られた実験条件と性能値との複数の組み合わせ、及び２回目の複数の実験条件の候補それぞれを出力モデル２２Ｃに入力することにより出力モデル２２Ｃから２回目の複数の実験条件の候補それぞれに対応するＱ値を得る。この場合、出力部３２Ａは、１回目のＱ値と２回目のＱ値の累計値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する。なお、ここでは、２回のＱ値の累計値を用いる場合を説明したが、３回以上のＱ値の累計値を用いる場合も同様に可能である。 Further, in the operation phase of the second embodiment, the output unit 32A sequentially inputs candidates for a plurality of experimental conditions into the output model 22C a plurality of times, so that the cumulative Q value obtained is the maximum experimental condition. Can be output as a candidate for the experimental conditions to be tested next. In this case, the output unit 32A first obtains Q values corresponding to each of the candidates for the first plurality of experimental conditions from the output model 22C, as in the second embodiment. Next, the output unit 32A uses, for example, a combination of a plurality of combinations of the experimental conditions and performance values input to the output model 22C for the first time, and a combination of the experimental condition candidates and performance values input to the output model 22C for the first time. To add. This performance value may be estimated by, for example, a known method such as SVM (Support Vector Machine). Then, as in the first time, the output unit 32A inputs a plurality of combinations of the additionally obtained experimental conditions and the performance values, and each of the candidates for the second plurality of experimental conditions into the output model 22C. From the output model 22C, Q values corresponding to each of the candidates for the second plurality of experimental conditions are obtained. In this case, the output unit 32A outputs the candidate of the experimental condition in which the cumulative value of the first Q value and the second Q value is the maximum as the candidate of the experimental condition to be the next experiment target. Although the case where the cumulative value of the Q value is used twice is described here, the case where the cumulative value of the Q value three times or more is used is also possible.

また、上記各実施形態でＣＰＵがソフトウェア（プログラム）を実行することにより実行した各種処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＦＰＧＡ（Field-Programmable Gate Array）等の製造後に回路構成を変更可能なＰＬＤ（Programmable Logic Device）、及びＡＳＩＣ（Application Specific Integrated Circuit）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、上記各種処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より詳細には、半導体素子等の回路素子を組み合わせた電気回路である。 Further, various processors other than the CPU may execute various processes executed by the CPU executing software (program) in each of the above embodiments. In this case, the processor is a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing an FPGA (Field-Programmable Gate Array) or the like, and an ASIC (Application Specific Integrated Circuit) or the like in order to execute a specific process. An example is a dedicated electric circuit or the like, which is a processor having a circuit configuration designed exclusively for the purpose. Further, the above-mentioned various processes may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a plurality of FPGAs and a combination of a CPU and an FPGA). Etc.). Further, the hardware structure of these various processors is, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined.

また、上記各実施形態では、学習プログラム５０が記憶部４２に予め記憶（インストール）されている態様を説明したが、これに限定されない。学習プログラム５０は、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory）、及びＵＳＢ（Universal Serial Bus）メモリ等の非一時的記録媒体に記録された形態で提供されてもよい。また、学習プログラム５０は、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 Further, in each of the above embodiments, the mode in which the learning program 50 is stored (installed) in the storage unit 42 in advance has been described, but the present invention is not limited to this. The learning program 50 is provided in a form recorded on a non-temporary recording medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), and a USB (Universal Serial Bus) memory. May be done. Further, the learning program 50 may be downloaded from an external device via a network.

本願は２０１８年４月１１日出願の日本出願第２０１８−０７６００１号の優先権を主張すると共に、その全文を参照により本明細書に援用する。 This application claims the priority of Japanese Application No. 2018-076001 filed on April 11, 2018, the full text of which is incorporated herein by reference.

Claims

The experimental conditions output by inputting the plurality of combinations to the output model in which a plurality of combinations of the experimental conditions for producing a material or a drug and the performance values of the experimental results are input and the experimental conditions are output are input. , A derivation unit that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting it to the experimental model for performing a virtual experiment.
A learning unit that trains the output model by machine learning that reflects the evaluation value derived by the derivation unit.
A learning device equipped with.

The higher the ratio of the values satisfying the target performance in the plurality of performance values, the better the evaluation value, or the number of virtual experiments until a performance value satisfying the target performance is obtained is small. The learning device according to claim 1, wherein the learning device has a moderate value, or the performance value is closer to the target performance.

The learning device according to claim 1 or 2, wherein the derivation unit corrects the evaluation value to be low when experimental conditions that do not satisfy a predetermined rule are output from the output model.

The learning device according to any one of claims 1 to 3, wherein the derivation unit corrects the experimental conditions output from the output model to experimental conditions that can be used in an actual experiment.

The learning device according to any one of claims 1 to 4, wherein the output model is a model learned by using a genetic algorithm.

A plurality of combinations and a plurality of combinations of the above-mentioned multiple combinations and a plurality of combinations of the experimental conditions for generating a material or a drug and the performance values of the experimental results, and an output model in which the candidates of the experimental conditions are input and the action value in reinforcement learning is output. Of the plurality of action values output by inputting each of the different candidates for the experimental condition, the candidate for the experimental condition corresponding to the action value equal to or higher than the predetermined value is used as an experimental model for performing a virtual experiment. A learning device provided with a learning unit that trains the output model using a value derived based on the performance value of the experimental result obtained by inputting as a reward.

The higher the ratio of the values satisfying the target performance in the plurality of the performance values, the better the reward, or the smaller the number of virtual experiments until the performance value satisfying the target performance is obtained. The learning device according to claim 6, wherein the better the value is, or the closer the performance value is to the target performance, the better the value.

The reinforcement learning is Q-learning,
The learning device according to claim 6 or 7, wherein the action value is a Q value.

When the output model learned by the learning unit is used, the cumulative action value output by sequentially inputting a plurality of candidates for the experimental conditions into the output model a plurality of times is maximized. The learning device according to any one of claims 6 to 8, further comprising an output unit that outputs a candidate as a candidate for an experimental condition to be tested next.

The learning device according to any one of claims 1 to 9, wherein the experimental model is a model obtained by machine learning.

There are multiple experimental models,
The learning device according to any one of claims 1 to 10, wherein the plurality of experimental models have different model creation conditions.

There are multiple output models,
The learning device according to any one of claims 1 to 11, wherein the plurality of output models have different model creation conditions.

The experimental conditions output by inputting the plurality of combinations to the output model in which a plurality of combinations of the experimental conditions for producing a material or a drug and the performance values of the experimental results are input and the experimental conditions are output are input. , The evaluation value of the output model is derived using the performance value of the experimental result obtained by inputting to the experimental model for performing the virtual experiment.
A learning method in which a computer executes a process of learning the output model by machine learning that reflects the derived evaluation value.

The experimental conditions output by inputting the plurality of combinations to the output model in which a plurality of combinations of the experimental conditions for producing a material or a drug and the performance values of the experimental results are input and the experimental conditions are output are input. , The evaluation value of the output model is derived using the performance value of the experimental result obtained by inputting to the experimental model for performing the virtual experiment.
A learning program for causing a computer to execute a process of learning the output model by machine learning that reflects the derived evaluation value.

A plurality of combinations and a plurality of the above-mentioned combinations and a plurality of combinations of the experimental conditions for producing a material or a drug and the performance values of the experimental results, and an output model in which the candidates of the experimental conditions are input and the action value in the reinforcement learning is output. Of the plurality of action values output by inputting each of the different candidates for the experimental condition, the candidate for the experimental condition corresponding to the action value equal to or higher than the predetermined value is used as an experimental model for performing a virtual experiment. A learning method in which a computer executes a process of training the output model using a value derived based on the performance value of the experimental result obtained by input as a reward.

A plurality of combinations and a plurality of the above-mentioned combinations and a plurality of combinations of the experimental conditions for producing a material or a drug and the performance values of the experimental results, and an output model in which the candidates of the experimental conditions are input and the action value in the reinforcement learning is output. Of the plurality of action values output by inputting each of the different candidates for the experimental condition, the candidate for the experimental condition corresponding to the action value equal to or higher than the predetermined value is used as an experimental model for performing a virtual experiment. A learning program for causing a computer to execute a process of training the output model using a value derived based on the performance value of the experimental result obtained by inputting as a reward.