JP2020035325A

JP2020035325A - Design system, learned model generation method, and design program

Info

Publication number: JP2020035325A
Application number: JP2018163138A
Authority: JP
Inventors: 一男米倉; Kazuo Yonekura; 均服部; Hitoshi Hattori
Original assignee: IHI Corp
Current assignee: IHI Corp
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-03-05

Abstract

To reduce processing load of a computer for designing a tangible object.SOLUTION: A design system includes at least one processor. The at least one processor inputs each of a plurality of initial data indicative of a structure of a tangible object into a calculation model and repeatedly performs enhanced learning. Then, in response to end conditions of predetermined enhancement learning being satisfied, the at least one processor acquires a calculation model obtained by the enhancement learning as a learned model for designing the structure of the tangible object.SELECTED DRAWING: Figure 2

Description

本発明の一側面は設計システム、学習済みモデル生成方法、および設計プログラムに関する。 One aspect of the present invention relates to a design system, a learned model generation method, and a design program.

従来から、コンピュータを用いて有体物を設計する手法が知られている。例えば、特許文献１には、多層フィードフォワード型ニューラルネットワークのデータで、タイヤの設計パラメータとタイヤ性能との非線形な対応を関係付ける変換系を構成する、空気入りタイヤの設計方法が記載されている。特許文献２には、回転方向毎にフィードバック制御を行い、逆運動モデルの構築、および運転中の更新を可能とする多自由度超音波モータの回転位置制御方法が記載されている。 Conventionally, a method of designing a tangible object using a computer has been known. For example, Patent Literature 1 describes a pneumatic tire design method that configures a conversion system that associates a nonlinear correspondence between tire design parameters and tire performance with data of a multilayer feedforward neural network. . Patent Literature 2 discloses a method of controlling the rotational position of a multi-degree-of-freedom ultrasonic motor that performs feedback control for each rotation direction to enable construction of an inverse motion model and update during operation.

特開２００１−００９８３８号公報JP 2001-009838 A 特許４２０９２９１号公報Japanese Patent No. 4209291

従来技術では、或る有体物について汎用性の高い計算モデルを得ることが非常に困難である。具体的には、特定の条件下（例えば、風速、温度、振動入力などの、有体物以外からの条件）では良好な解を導出できる計算モデルが、別の条件下では良好な解を得ることができない。そのため、有体物に求められる条件に合わせて最初から計算モデルを構築し直す必要があり、これはコンピュータの処理負荷を増大させることになる。そこで、有体物を設計するコンピュータの処理負荷を低減することが望まれている。 In the related art, it is very difficult to obtain a highly versatile calculation model for a certain tangible object. Specifically, a calculation model that can derive a good solution under specific conditions (for example, conditions such as wind speed, temperature, vibration input, etc. from other than a tangible object) can obtain a good solution under other conditions. Can not. Therefore, it is necessary to reconstruct the calculation model from the beginning in accordance with the conditions required for the tangible object, and this increases the processing load of the computer. Therefore, it is desired to reduce the processing load of a computer that designs a tangible object.

本発明の一側面に係る設計システムは、少なくとも一つのプロセッサを備え、少なくとも一つのプロセッサが、有体物の構造を示す複数の初期データのそれぞれを計算モデルに入力して強化学習を繰り返し実行し、予め定められた強化学習の終了条件が満たされたことに応答して、強化学習により得られた計算モデルを、有体物の構造を設計するための学習済みモデルとして取得する。 A design system according to one aspect of the present invention includes at least one processor, and at least one processor repeatedly executes reinforcement learning by inputting each of a plurality of initial data indicating a structure of a tangible object to a computation model, In response to the determined termination condition of reinforcement learning being satisfied, a calculation model obtained by reinforcement learning is acquired as a trained model for designing the structure of a tangible object.

本発明の一側面に係る学習済みモデル生成方法は、有体物の構造を示す複数の初期データのそれぞれを計算モデルに入力して強化学習を繰り返し実行するステップと、予め定められた強化学習の終了条件が満たされたことに応答して、強化学習により得られた計算モデルを、有体物の構造を設計するための学習済みモデルとして取得するステップとを含む。 A method for generating a learned model according to one aspect of the present invention includes the steps of repeatedly executing reinforcement learning by inputting each of a plurality of initial data indicating the structure of a tangible object to a calculation model; and a predetermined reinforcement learning termination condition. Obtaining a computation model obtained by reinforcement learning as a trained model for designing the structure of a tangible object in response to the fact that is satisfied.

本発明の一側面に係る設計プログラムは、有体物の構造を示す複数の初期データのそれぞれを計算モデルに入力して強化学習を繰り返し実行するステップと、予め定められた強化学習の終了条件が満たされたことに応答して、強化学習により得られた計算モデルを、有体物の構造を設計するための学習済みモデルとして取得するステップとをコンピュータに実行させる。 A design program according to an aspect of the present invention includes a step of repeatedly executing reinforcement learning by inputting each of a plurality of initial data indicating a structure of a tangible object to a calculation model, and satisfying a predetermined reinforcement learning end condition. Acquiring the calculation model obtained by the reinforcement learning as a learned model for designing the structure of a tangible object.

このような側面においては、複数の初期データを用いて強化学習を繰り返すことで学習済みモデルが得られる。この学習済みモデルは様々な初期データを処理することで得られるので、様々な条件に対応することが可能である。すなわち、様々な条件のそれぞれの下での有体物の設計を一つの学習済みモデルで済ませることができる。個々の条件について計算モデルを最初から構築する必要がなくなるので、計算モデルの構築に要するコンピュータの処理量が減少する。したがって、有体物を設計するコンピュータの処理負荷を低減することができる。 In such an aspect, a trained model is obtained by repeating reinforcement learning using a plurality of initial data. Since this learned model is obtained by processing various initial data, it is possible to cope with various conditions. That is, the design of a tangible object under each of various conditions can be completed with one learned model. Since it is not necessary to construct a calculation model for each condition from the beginning, the amount of computer processing required to construct a calculation model is reduced. Therefore, it is possible to reduce the processing load on the computer that designs the tangible object.

他の側面に係る設計システムでは、少なくとも一つのプロセッサが、複数の初期データのそれぞれについて、計算モデルの入力データの少なくとも一部を変更しながら強化学習を繰り返してもよい。設計システムが一つの初期データに対して自動的に入力データを変更しながら強化学習を繰り返すので、予め用意する初期データの量を低減することができる。 In a design system according to another aspect, at least one processor may repeat reinforcement learning while changing at least a part of input data of a computation model for each of a plurality of initial data. Since the design system repeats the reinforcement learning while automatically changing the input data for one initial data, the amount of the initial data prepared in advance can be reduced.

他の側面に係る設計システムでは、入力データの少なくとも一部の変更が、入力データの少なくとも一部をランダムに変更する処理を含んでもよい。入力データをランダムに変更することで、様々な条件に対応する学習済みモデルを生成することができる。 In a design system according to another aspect, changing at least a part of the input data may include a process of randomly changing at least a part of the input data. By randomly changing input data, it is possible to generate a learned model corresponding to various conditions.

他の側面に係る設計システムでは、入力データの少なくとも一部の変更が、前回の強化学習の出力データに基づいて入力データの少なくとも一部を変更する処理を含んでもよい。強化学習の結果に基づいて次の入力データを設定することで、学習済みモデルを効率的に生成することができる。 In a design system according to another aspect, changing at least a part of the input data may include a process of changing at least a part of the input data based on output data of the previous reinforcement learning. By setting the next input data based on the result of reinforcement learning, a learned model can be efficiently generated.

他の側面に係る設計システムでは、有体物が、建造物の少なくとも一部、構造物の少なくとも一部、および移動体の少なくとも一部の中から選択されてもよい。この場合には、建造物、構造物、または移動体を設計する際のコンピュータの処理負荷を低減することができる。 In a design system according to another aspect, a tangible object may be selected from at least a part of a building, at least a part of a structure, and at least a part of a moving object. In this case, the processing load of the computer when designing a building, a structure, or a moving object can be reduced.

他の側面に係る設計システムでは、少なくとも一つのプロセッサが、有体物の構造を示す入力データを学習済みモデルに入力することで有体物の設計データを出力してもよい。この場合には、コンピュータの処理負荷を低減しつつ有体物を設計することができる。 In a design system according to another aspect, at least one processor may output design data of a tangible object by inputting input data indicating a structure of the tangible object to a learned model. In this case, a tangible object can be designed while reducing the processing load on the computer.

本発明の一側面によれば、有体物を設計するコンピュータの処理負荷を低減することができる。 According to one embodiment of the present invention, it is possible to reduce the processing load of a computer that designs a tangible object.

実施形態に係る設計システムで用いられるコンピュータのハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of a computer used in the design system according to the embodiment. 実施形態に係る設計システムの機能構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of a design system according to an embodiment. 実施形態に係る設計システムの学習処理を示すフローチャートである。It is a flowchart which shows the learning process of the design system which concerns on embodiment. 実施形態に係る設計システムの最適化処理を示すフローチャートである。5 is a flowchart illustrating optimization processing of the design system according to the embodiment. 第２実施形態における強化学習の一例を示すグラフである。It is a graph which shows an example of reinforcement learning in a 2nd embodiment. 第３実施形態に係る設計システムを説明するための図である。It is a figure for explaining the design system concerning a 3rd embodiment.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements will be denoted by the same reference symbols, without redundant description.

［システムの概要］
本開示に係る設計システムは、有体物を設計するコンピュータシステムである。より具体的には、設計システムは有体物の構造を設計する。有体物とは、物理的に空間の一部を占め、かつ形状を有する物をいう。例えば、有体物は、人為的に製造または建造される任意の物、すなわち任意の人工物であり得る。有体物は動産でも不動産でもよい。有体物は、建物などの建造物でもよいし、橋などの構造物でもよい。あるいは、有体物は、バイク、自転車、電車、水上航走体、水中航走体、飛行体などの移動体でもよい。有体物は、建造物の少なくとも一部、構造物の少なくとも一部、または移動体の少なくとも一部でもよい。したがって、例えば、有体物は移動体を構成する装置または部品でもよい。もちろん、有体物はこれらの例に限定されない。構造とは、有体物を構成する諸要素の組合せ方のことをいう。構造を規定する諸要素の例として、形状、寸法、配置、および材料が挙げられるが、構造を規定する要素はこれらに限定されない。設計とは、有体物を具現化するための準備のことをいう。典型的には、設計の成果物の例として、意図する有体物の構造を示す設計図、仕様書、および模型が得られる。 [System Overview]
A design system according to the present disclosure is a computer system for designing a tangible object. More specifically, the design system designs the structure of a tangible object. The tangible object is an object that physically occupies a part of the space and has a shape. For example, a tangible object can be any object that is artificially manufactured or built, ie, any man-made object. The tangible object may be a movable property or a real estate. The tangible object may be a building such as a building or a structure such as a bridge. Alternatively, the tangible object may be a mobile object such as a motorcycle, a bicycle, a train, a watercraft, a watercraft, or a flying object. The tangible object may be at least a part of a building, at least a part of a structure, or at least a part of a moving object. Therefore, for example, the tangible object may be a device or a part configuring the moving body. Of course, the tangible entity is not limited to these examples. The structure refers to a combination of various elements constituting a tangible object. Examples of elements that define a structure include, but are not limited to, shape, dimensions, arrangement, and materials. Design refers to the preparation for realizing a tangible object. Typically, design drawings, specifications, and models showing the structure of the intended tangible object are obtained as examples of design products.

設計システムは強化学習を利用する。強化学習とは、与えられた情報に基づいて反復的に学習することで法則またはルールを自律的に見つけ出す機械学習の一種である。強化学習では、環境内のエージェントが、現在の状態を観測して、取るべき行動を決定する。エージェントは行動を選択することで環境から報酬を得ることができる。エージェントは、一連の行動を通じて、報酬が最も多く得られるような方策を学習する。エージェントは学習を積み重ねることで、最適解を出力することが可能になる。ここで、最適解とは、最適であると推定される解のことをいい、現実に最適な解であるとは限らないことに留意されたい。 The design system uses reinforcement learning. Reinforcement learning is a type of machine learning that autonomously finds rules or rules by repeatedly learning based on given information. In reinforcement learning, agents in the environment observe the current state and decide what action to take. Agents can get rewards from the environment by choosing actions. The agent learns a strategy to obtain the highest reward through a series of actions. An agent can output an optimal solution by accumulating learning. Here, it should be noted that the optimal solution refers to a solution that is estimated to be optimal, and is not necessarily an optimal solution in practice.

設計システムは、学習を積み重ねたエージェントを含む計算モデルを生成し、この計算モデルを学習済みモデルとして取得することができる。これは学習フェーズに相当する。学習済みモデルは、有体物の構造を設計するために最適であると推定される計算モデルであり、“現実に最適である計算モデル”とは限らないことに留意されたい。計算モデルはアルゴリズムおよびデータ構造を用いて構築することができる。例えば、計算モデルは、人間の脳神経系の仕組みを模した情報処理のモデルであるニューラルネットワークを含んで構築されてもよい。 The design system can generate a calculation model including an agent that has accumulated learning, and acquire this calculation model as a trained model. This corresponds to the learning phase. It should be noted that the trained model is a calculation model that is estimated to be optimal for designing the structure of a tangible object, and is not necessarily a “calculation model that is actually optimal”. Computational models can be built using algorithms and data structures. For example, the calculation model may be constructed to include a neural network that is a model of information processing that simulates the mechanism of the human nervous system.

設計システムは、学習済みモデルを用いて入力データを処理することで、最適な設計データを出力することもできる。これは最適化フェーズに相当する。上述したように、最適な設計データは現実に最適であるとは限らない。 The design system can output optimal design data by processing the input data using the trained model. This corresponds to the optimization phase. As described above, optimal design data is not always optimal in practice.

計算モデルおよび学習済みモデルの双方について、入力データおよび出力データの構成は何ら限定されない。例えば、入力データおよび出力データのいずれも、１以上のパラメータで構成されるベクトルで表すことができる。入力データは、計算モデルの外部から入力されてもよく、本開示ではこれを初期データともいう。あるいは、入力データは、計算モデルにより設定されてもよい。 The configuration of input data and output data is not limited at all for both the calculation model and the learned model. For example, both input data and output data can be represented by a vector composed of one or more parameters. The input data may be input from outside the calculation model, and is also referred to as initial data in the present disclosure. Alternatively, the input data may be set by a calculation model.

エージェントが様々な状態下で学習を積み重ねれば、様々な条件下で最適解を出力することが可能になる。したがって、設計システムは、様々な設計条件に対応可能な学習済みモデルを生成することができる。これは、その学習済みモデルを様々な設計条件下で使い回すことができることを意味する。条件が少し変わっても計算モデルを作り直す必要がなくなるので、計算モデルの構築に要するコンピュータの処理量が減少する。したがって、有体物を設計するコンピュータの処理負荷を低減することができる。遺伝的アルゴリズムなどの従来技術で有体物の構造を設計する場合には、設計の条件が変わる度に計算モデルを最初から構築する必要がある。したがって、従来技術ではコンピュータの処理負荷が増大し、設計者の負担も大きい。本開示に係る設計システムは、従来技術と比べて、コンピュータの処理負荷と設計者の負担とを低減できるという有利な技術的効果を有する。 If an agent accumulates learning under various conditions, it is possible to output an optimal solution under various conditions. Therefore, the design system can generate a learned model that can respond to various design conditions. This means that the trained model can be reused under various design conditions. Since the calculation model does not need to be re-created even if the conditions slightly change, the amount of computer processing required for building the calculation model is reduced. Therefore, it is possible to reduce the processing load on the computer that designs the tangible object. When designing the structure of a tangible object using a conventional technique such as a genetic algorithm, it is necessary to construct a calculation model from the beginning each time the design conditions change. Therefore, in the related art, the processing load on the computer is increased, and the burden on the designer is also large. The design system according to the present disclosure has an advantageous technical effect that a processing load on a computer and a burden on a designer can be reduced as compared with the related art.

学習済みモデルはコンピュータシステム間で移植可能である。したがって、或るコンピュータシステムで生成された学習済みモデルを、別のコンピュータシステムで用いることができる。もちろん、一つのコンピュータシステムが学習済みモデルの生成および利用の双方を実行してもよい。すなわち、設計システムは、学習フェーズおよび設計フェーズの双方を実行してもよいし、学習フェーズおよび設計フェーズのいずれか一方を実行しなくてもよい。 The trained model is portable between computer systems. Thus, a trained model generated on one computer system can be used on another computer system. Of course, one computer system may perform both generation and utilization of the trained model. That is, the design system may execute both the learning phase and the design phase, or may not execute any one of the learning phase and the design phase.

［第１実施形態］
図１〜図４を参照しながら、上記の設計システムの一般的な構成および動作の一例を第１実施形態として説明する。図１は、第１実施形態に係る設計システム１を構成するコンピュータ１００の一般的なハードウェア構成を示す図である。図２は設計システム１の機能構成の一例を示す図である。図３は、設計システム１の学習処理の一例を示すフローチャートである。図４は、設計システム１の最適化処理の一例を示すフローチャートである。図３および図４はそれぞれ、学習フェーズおよび最適化フェーズに対応する。 [First Embodiment]
An example of the general configuration and operation of the above-described design system will be described as a first embodiment with reference to FIGS. FIG. 1 is a diagram illustrating a general hardware configuration of a computer 100 configuring a design system 1 according to the first embodiment. FIG. 2 is a diagram illustrating an example of a functional configuration of the design system 1. FIG. 3 is a flowchart illustrating an example of the learning process of the design system 1. FIG. 4 is a flowchart illustrating an example of the optimization process of the design system 1. 3 and 4 correspond to the learning phase and the optimization phase, respectively.

図１に示すように、コンピュータ１００はプロセッサ１０１、主記憶部１０２、補助記憶部１０３、通信制御部１０４、入力装置１０５、および出力装置１０６を備えてもよい。プロセッサ１０１はオペレーティングシステムおよびアプリケーション・プログラムを実行する。主記憶部１０２は例えばＲＯＭおよびＲＡＭで構成される。補助記憶部１０３は例えばハードディスクまたはフラッシュメモリで構成され、一般に主記憶部１０２よりも大量のデータを記憶する。通信制御部１０４は例えばネットワークカードまたは無線通信モジュールで構成される。入力装置１０５は例えばキーボード、マウス、タッチパネルなどで構成される。出力装置１０６は例えばモニタおよびスピーカで構成される。 As shown in FIG. 1, the computer 100 may include a processor 101, a main storage unit 102, an auxiliary storage unit 103, a communication control unit 104, an input device 105, and an output device 106. The processor 101 executes an operating system and an application program. The main storage unit 102 includes, for example, a ROM and a RAM. The auxiliary storage unit 103 is composed of, for example, a hard disk or a flash memory, and generally stores a larger amount of data than the main storage unit 102. The communication control unit 104 includes, for example, a network card or a wireless communication module. The input device 105 includes, for example, a keyboard, a mouse, a touch panel, and the like. The output device 106 includes, for example, a monitor and a speaker.

設計システム１の各機能要素は、補助記憶部１０３に予め記憶される設計プログラム１１０により実現される。プロセッサ１０１または主記憶部１０２の上に設計プログラム１１０を読み込ませてその設計プログラム１１０を実行させることで実現される。プロセッサ１０１はその設計プログラム１１０に従って、通信制御部１０４、入力装置１０５、または出力装置１０６を動作させ、主記憶部１０２または補助記憶部１０３におけるデータの読み出しおよび書き込みを行う。処理に必要なデータまたはデータベースは主記憶部１０２または補助記憶部１０３内に格納される。 Each functional element of the design system 1 is realized by the design program 110 stored in the auxiliary storage unit 103 in advance. This is realized by reading the design program 110 into the processor 101 or the main storage unit 102 and executing the design program 110. The processor 101 operates the communication control unit 104, the input device 105, or the output device 106 according to the design program 110 to read and write data in the main storage unit 102 or the auxiliary storage unit 103. Data or a database required for processing is stored in the main storage unit 102 or the auxiliary storage unit 103.

設計プログラム１１０は、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリなどの有形の記録媒体に固定的に記録された上で提供されてもよい。あるいは、設計プログラム１１０は、搬送波に重畳されたデータ信号として通信ネットワークを介して提供されてもよい。 The design program 110 may be provided after being fixedly recorded on a tangible recording medium such as a CD-ROM, a DVD-ROM, and a semiconductor memory. Alternatively, the design program 110 may be provided via a communication network as a data signal superimposed on a carrier wave.

設計システム１は１台のコンピュータ１００で構成されてもよいし、複数台のコンピュータ１００で構成されてもよい。複数台のコンピュータ１００を用いる場合には、これらのコンピュータ１００がインターネットやイントラネットなどの通信ネットワークを介して接続されることで、論理的に一つの設計システム１が構築される。 The design system 1 may be constituted by one computer 100, or may be constituted by a plurality of computers 100. When a plurality of computers 100 are used, one design system 1 is logically constructed by connecting these computers 100 via a communication network such as the Internet or an intranet.

本実施形態では、設計システム１は、有体物の構造を設計するための学習済みモデルを生成する第１システム１０と、その学習済みモデルを用いて最適な設計データを出力する第２システム２０とを備える。第１システム１０は、強化学習を実行することで学習済みモデルを生成および取得する学習部１１を機能要素として備える。第２システム２０は、その学習済みモデルを用いて入力データから最適な設計データを算出する最適化部２１を機能要素として備える。 In the present embodiment, the design system 1 includes a first system 10 that generates a learned model for designing the structure of a tangible object, and a second system 20 that outputs optimal design data using the learned model. Prepare. The first system 10 includes, as a functional element, a learning unit 11 that generates and acquires a learned model by executing reinforcement learning. The second system 20 includes, as a functional element, an optimization unit 21 that calculates optimal design data from input data using the learned model.

上述したように、学習済みモデルはコンピュータシステム間で移植可能である。したがって、本実施形態のように、第１システム１０で生成された計算モデルを、別のコンピュータシステムである第２システム２０で用いることができる。もちろん、設計システムを第１システム１０と第２システム２０とに分けることは必須ではない。 As mentioned above, the trained model is portable between computer systems. Therefore, as in the present embodiment, the calculation model generated by the first system 10 can be used by the second system 20, which is another computer system. Of course, it is not essential to divide the design system into the first system 10 and the second system 20.

図３を参照しながら、学習済みモデルの生成方法の一例をする。この例では、強化学習のために計算モデルに入力されるデータを、変数ｉおよびｎを用いて、入力ベクトルｘ_ｉ（ｎ）で示す。この入力ベクトルｘ_ｉ（ｎ）は、有体物の構造を示すデータであり、入力データの一例である。変数ｉは初期ベクトルを区別するための値であり、１から始まる値であるとする。ここで、初期ベクトルは、強化学習のために計算モデルの外部で用意される入力データであり、初期データの一例である。一方、変数ｎは、一つの初期ベクトルに基づく学習の繰り返し回数を示す値である。任意の変数ｉについて、ｘ_ｉ（０）は初期ベクトルを示す。一方、ｘ_ｉ（１）、ｘ_ｉ（２）などは、計算モデルの内部で前回の入力ベクトルｘ_ｉ（ｎ−１）から変更された新たな入力ベクトルを示す。入力ベクトルｘ_ｉ（ｎ）のデータ構造は限定されず、最終的に得ようとする設計データのデータ構造に基づいて任意に設定されてよい。例えば、入力ベクトルｘ_ｉ（ｎ）は１以上のパラメータを含んでもよい。入力ベクトルｘ_ｉ（ｎ）は、有体物の構造を示すパラメータに加えて、有体物を取り巻く環境に関する環境条件を示すパラメータを含んでもよい。環境条件の例として流速、気温、圧力などが挙げられるが、環境条件を構成する要素は任意に選択されてよい。入力ベクトルに環境条件を含めることで、様々な環境条件に適用可能な学習済みモデルを生成することができる。 An example of a method of generating a learned model will be described with reference to FIG. In this example, data input to the calculation model for reinforcement learning is indicated by an input vector x _i (n) using variables i and n. The input vector x _i (n) is data indicating the structure of a tangible object, and is an example of input data. The variable i is a value for distinguishing the initial vector, and is assumed to be a value starting from 1. Here, the initial vector is input data prepared outside the calculation model for reinforcement learning, and is an example of initial data. On the other hand, the variable n is a value indicating the number of repetitions of learning based on one initial vector. For any variable i, x _i (0) indicates the initial vector. On the other hand, x _i (1), x _i (2), and the like indicate new input vectors changed from the previous input vector x _i (n−1) inside the calculation model. The data structure of the input vector x _i (n) is not limited, and may be arbitrarily set based on the data structure of the design data to be finally obtained. For example, the input vector x _i (n) may include one or more parameters. The input vector x _i (n) may include a parameter indicating an environmental condition related to an environment surrounding the tangible object, in addition to a parameter indicating the structure of the tangible object. Examples of the environmental conditions include a flow velocity, an air temperature, a pressure, and the like, but the elements constituting the environmental conditions may be arbitrarily selected. By including the environmental condition in the input vector, a learned model applicable to various environmental conditions can be generated.

学習部１１は、ステップＳ１１でｉ番目の初期ベクトルｘ_ｉ（０）を設定する。初期ベクトルの設定方法は限定されない。例えば、学習部１１はランダムに初期ベクトルｘ_ｉ（０）を設定してもよい。あるいは、学習部１１は外部から提供される入力ベクトルをそのまま初期ベクトルｘ_ｉ（０）として設定してもよい。例えば、学習部１１は、設計システム１のユーザにより入力されたデータを初期ベクトルｘ_ｉ（０）として設定してもよい。あるいは、学習部１１は、データベースなどの任意の記憶部から読み出したデータを初期ベクトルｘ_ｉ（０）として設定してもよい。 The learning unit 11 sets the i-th initial vector x _i (0) in step S11. The method for setting the initial vector is not limited. For example, the learning unit 11 may randomly set the initial vector x _i (0). Alternatively, the learning unit 11 may set the input vector provided from the outside as it is as the initial vector x _i (0). For example, the learning unit 11 may set data input by a user of the design system 1 as an initial vector x _i (0). Alternatively, the learning unit 11 may set data read from an arbitrary storage unit such as a database as the initial vector x _i (0).

ステップＳ１２では、学習部１１は初期ベクトルｘ_ｉ（０）に基づいて性能値ｆ_ｉ（０）を求める。この性能値ｆ_ｉ（０）はエージェントに対する報酬を決定するために用いることができる。性能値のデータ構造は限定されず、入力ベクトルに対応して任意に設定されてよい。例えば、性能値は１以上のパラメータを含んでもよい。報酬を決定するために性能値ｆ_ｉ（０）を用いることは必須ではないので、このステップＳ１２は省略されてもよい。 In step S12, the learning unit 11 obtains a performance value f _i (0) based on the initial vector x _i (0). This performance value f _i (0) can be used to determine a reward for the agent. The data structure of the performance value is not limited, and may be arbitrarily set corresponding to the input vector. For example, a performance value may include one or more parameters. Since it is not essential to use the performance value f _i (0) to determine the reward, this step S12 may be omitted.

その後、学習部１１は、初期ベクトルｘ_ｉ（０）について、計算モデルの入力データの少なくとも一部を変更しながら強化学習を繰り返す。例えば、学習部１１は、初期ベクトルｘ_ｉ（０）を徐々に変化させながら、予め定められた回数だけ強化学習を繰り返し実行する。ただし、学習部１１は、或る一つの初期ベクトルｘ_ｉ（０）で設定された環境条件については、変数ｉの値が同じである複数の入力ベクトルｘ_ｉ（ｎ）の間で変化させない。この繰り返し処理をステップＳ１３〜Ｓ１６として説明する。繰り返し回数は任意に設定されてよく、例えば２０でもよいし５０でもよい。本実施形態ではこの繰り返し回数をＭで表す。 Thereafter, the learning unit 11 repeats the reinforcement learning while changing at least a part of the input data of the calculation model for the initial vector x _i (0). For example, the learning unit 11 repeatedly executes the reinforcement learning a predetermined number of times while gradually changing the initial vector x _i (0). However, the learning unit 11 does not change the environmental conditions set by a certain initial vector x _i (0) among a plurality of input vectors x _i (n) having the same value of the variable i. This repetition process will be described as steps S13 to S16. The number of repetitions may be set arbitrarily, and may be, for example, 20 or 50. In the present embodiment, the number of repetitions is represented by M.

ステップＳ１３では、学習部１１は入力ベクトルｘ_ｉ（ｎ）に対する強化学習を実行することで出力ベクトルを算出する。ステップＳ１２の直後は、学習部１１は初期ベクトルｘ_ｉ（０）に対する強化学習を実行する。出力ベクトルは、エージェントが取るべき行動を示すデータであり、出力データの一例である。出力ベクトルのデータ構造は限定されず、入力ベクトルに対応して任意に設定されてよい。 In step S13, the learning unit 11 calculates an output vector by executing reinforcement learning on the input vector x _i (n). Immediately after step S12, the learning unit 11 performs reinforcement learning on the initial vector x _i (0). The output vector is data indicating an action to be taken by the agent, and is an example of output data. The data structure of the output vector is not limited, and may be set arbitrarily according to the input vector.

ステップＳ１４では、学習部１１はその出力ベクトルに基づいて次の入力ベクトルｘ_ｉ（ｎ＋１）を設定する。前回の入力ベクトルがｘ_ｉ（０）であれば、学習部１１は入力ベクトルｘ_ｉ（１）を設定する。次の入力ベクトルの設定方法は限定されない。例えば、学習部１１は前回の入力ベクトルｘ_ｉ（ｎ）の少なくとも一部を変更することで次の入力ベクトルｘ_ｉ（ｎ＋１）を設定してもよい。例えば、学習部１１は、前回の強化学習の出力ベクトルに基づいて、前回の入力ベクトルｘ_ｉ（ｎ）の少なくとも一部をランダムに変更することで次の入力ベクトルｘ_ｉ（ｎ＋１）を設定してもよい。あるいは、例えば、学習部１１はその出力ベクトルに基づいて、前回の入力ベクトルｘ_ｉ（ｎ）の少なくとも一部を、予め定められた数値範囲内でランダムに変更することで次の入力ベクトルｘ_ｉ（ｎ＋１）を設定してもよい。 In step S14, the learning unit 11 sets the next input vector x _i (n + 1) based on the output vector. If the previous input vector is x _i (0), the learning unit 11 sets the input vector x _i (1). The setting method of the next input vector is not limited. For example, the learning unit 11 may set the next input vector x _i (n + 1) by changing at least a part of the previous input vector x _i (n). For example, the learning unit 11 sets the next input vector x _i (n + 1) by randomly changing at least a part of the previous input vector x _i (n) based on the output vector of the previous reinforcement learning. You may. Alternatively, for example, the learning section 11 based on the output vector, at least a portion of the previous input vector x _{i (n),} the next input vector x _i by randomly varied within the numerical range defined in advance (N + 1) may be set.

ステップＳ１５では、学習部１１は入力ベクトルｘ_ｉ（ｎ＋１）に基づいて性能値ｆ_ｉ（ｎ＋１）を求め、この性能値ｆ_ｉ（ｎ＋１）に基づいて、エージェントに付与する報酬を決定する。学習部１１はその決定に基づいてエージェントに報酬を付与する。ステップＳ１５で実行される報酬の決定および付与は任意のタイミングで実行されてよい。学習部１１は、一つの初期データに基づくＭ回の繰り返し処理の間に、任意のタイミングで報酬を付与してよい。例えば、学習部１１はＭ回の繰り返し処理の間に、毎回、報酬を付与してもよいし、Ｍ回目の処理でのみ報酬を付与してもよい。報酬の決定方法および具体的な値は限定されない。例えば、学習部１１は、ｆ_ｉ（Ｍ）＞ｆ_ｉ（０）である場合に報酬を＋１に設定し、それ以外の場合に報酬を０に設定してもよい。あるいは、学習部１１は、ｆ_ｉ（ｎ）＞ｆ_ｉ（ｎ−１）である場合に報酬を＋１に設定し、ｆ_ｉ（ｎ）＜ｆ_ｉ（ｎ−１）の場合に報酬を−１に設定してもよい。報酬の値は性能値に基づいて算出されてもよい。 In step S15, the learning unit 11 obtains a performance value f _i (n + 1) based on the input vector x _i (n + 1), and determines a reward to be given to the agent based on the performance value f _i (n + 1). The learning unit 11 gives a reward to the agent based on the determination. The determination and grant of the reward executed in step S15 may be executed at any timing. The learning unit 11 may give a reward at an arbitrary timing during the M repetition processes based on one initial data. For example, the learning unit 11 may provide a reward each time during the M repetition processes, or may provide a reward only in the M-th process. The method of determining the reward and the specific value are not limited. For example, the learning unit 11 _sets the compensation to +1 in the case of _{f i (M)> f i} (0), rewards may be set to 0 otherwise. Alternatively, the learning section _11, the compensation in the case of _{f i (n)> f i} (n-1) is set to _+1, a compensation in the case of _{f i (n) <f i} (n-1) - It may be set to 1. The value of the reward may be calculated based on the performance value.

ステップＳ１６で示すように、ステップＳ１３〜Ｓ１５の処理はＭ回繰り返して実行される。 As shown in step S16, the processing of steps S13 to S15 is repeatedly executed M times.

ステップＳ１７では、学習部１１は終了条件を満たすか否かを判定する。この終了条件は、強化学習を終了して学習済みモデルを取得するための条件である。終了条件は任意に設定されてよい。例えば、学習部１１は、性能値ｆ_ｉ（Ｍ）が予め定められた基準を満たした場合に強化学習を終了してもよい。あるいは、学習部１１は、予め定められた個数の初期データを処理し終えた場合に強化学習を終了してもよい。ここで、予め用意される初期データの個数は何ら限定されず、例えば１００００でもよい。いずれにしても、終了条件が満たされるということは、該終了条件を満たす設計データが得られたことに対応し得る。終了条件を満たす設計データは、例えば、最後に処理された入力ベクトルで示されてもよい。 In step S17, the learning unit 11 determines whether or not a termination condition is satisfied. The end condition is a condition for ending the reinforcement learning and obtaining the learned model. The termination condition may be set arbitrarily. For example, the learning unit 11 may end the reinforcement learning when the performance value f _i (M) satisfies a predetermined criterion. Alternatively, the learning unit 11 may terminate the reinforcement learning when the predetermined number of initial data has been processed. Here, the number of initial data prepared in advance is not limited at all, and may be, for example, 10,000. In any case, satisfying the termination condition may correspond to obtaining design data satisfying the termination condition. The design data satisfying the termination condition may be indicated by, for example, the last processed input vector.

終了条件を満たさない場合には、処理はステップＳ１８に進む。学習部１１は変数ｉをインクリメントし、ステップＳ１１に戻って次の初期ベクトルｘ_ｉ（ｎ）を設定する。そして、学習部１１はステップＳ１２以降の処理を再び実行する。 If the termination condition is not satisfied, the process proceeds to step S18. The learning unit 11 increments the variable i, returns to step S11, and sets the next initial vector x _i (n). Then, the learning unit 11 executes the processing after step S12 again.

終了条件を満たす場合には、処理はステップＳ１９に進む。ステップＳ１９では、学習部１１は、強化学習された計算モデルを学習済みモデルとして取得する。この処理は、学習済みモデルが生成されたことを意味する。学習部１１はその学習済みモデルを出力する。学習済みモデルの出力方法は限定されない。例えば、学習部１１は学習済みモデルを、メモリ、データベースなどの記憶装置に格納してもよいし、他のコンピュータシステムに送信してもよい。 If the end condition is satisfied, the process proceeds to step S19. In step S19, the learning unit 11 acquires the calculation model on which the reinforcement learning has been performed as a learned model. This processing means that a learned model has been generated. The learning unit 11 outputs the learned model. The output method of the trained model is not limited. For example, the learning unit 11 may store the learned model in a storage device such as a memory or a database, or may transmit the learned model to another computer system.

図４を参照しながら、学習済みモデルを用いた最適化方法の一例を説明する。この例では、最適な設計データを得るための入力ベクトルを、変数ｎを用いてｘ（ｎ）で示す。この入力ベクトルｘ（ｎ）は、有体物の構造を示すデータである。入力ベクトルｘ（ｎ）のデータ構造は、学習部１１による強化学習で用いられる入力ベクトルｘ_ｉ（ｎ）と同じである。 An example of the optimization method using the learned model will be described with reference to FIG. In this example, an input vector for obtaining optimal design data is indicated by x (n) using a variable n. The input vector x (n) is data indicating the structure of a tangible object. The data structure of the input vector x (n) is the same as the input vector x _i (n) used in reinforcement learning by the learning unit 11.

ステップＳ２１では、最適化部２１は入力ベクトルｘ（０）を受け付ける。入力ベクトルｘ（０）の入手方法は限定されない。例えば、最適化部２１は、設計システム１のユーザにより入力されたデータを入力ベクトルｘ（０）として受け付けてもよい。あるいは、最適化部２１は、データベースなどの任意の記憶部から読み出したデータを入力ベクトルｘ（０）として受け付けてもよい。 In step S21, the optimization unit 21 receives an input vector x (0). The method of obtaining the input vector x (0) is not limited. For example, the optimization unit 21 may receive data input by a user of the design system 1 as an input vector x (0). Alternatively, the optimization unit 21 may receive data read from an arbitrary storage unit such as a database as the input vector x (0).

ステップＳ２２では、最適化部２１は入力ベクトルｘ（ｎ）を学習済みモデルに入力することで出力ベクトルを算出する。ステップＳ２１の直後は、最適化部２１は入力ベクトルｘ（０）を入力する。出力ベクトルのデータ構造は、学習部１１による強化学習で計算モデルから得られる出力ベクトルと同じである。 In step S22, the optimization unit 21 calculates an output vector by inputting the input vector x (n) to the learned model. Immediately after step S21, the optimization unit 21 receives the input vector x (0). The data structure of the output vector is the same as the output vector obtained from the calculation model by the reinforcement learning by the learning unit 11.

ステップＳ２３では、最適化部２１はその出力ベクトルに基づいて次の入力ベクトルｘ（ｎ＋１）を設定する。 In step S23, the optimizing unit 21 sets the next input vector x (n + 1) based on the output vector.

ステップＳ２４では、最適化部２１はその入力ベクトルｘ（ｎ＋１）に基づいて性能値ｆ（ｎ＋１）を求める。この処理は省略されてもよい。 In step S24, the optimization unit 21 calculates a performance value f (n + 1) based on the input vector x (n + 1). This processing may be omitted.

ステップＳ２５では、最適化部２１は終了条件を満たすか否かを判定する。この終了条件は、最適な設計データを出力するための条件である。この終了条件は、学習済みモデルを生成するために用いられた終了条件に対応するように設定されてもよい。あるいは、終了条件は性能値が満たすべき基準で定義されてもよい。あるいは、終了条件はステップＳ２２〜Ｓ２４の繰り返し回数で定義されてもよい。 In step S25, the optimization unit 21 determines whether or not the end condition is satisfied. This termination condition is a condition for outputting optimal design data. This end condition may be set to correspond to the end condition used to generate the learned model. Alternatively, the termination condition may be defined by a criterion that the performance value should satisfy. Alternatively, the termination condition may be defined by the number of repetitions of steps S22 to S24.

終了条件を満たさない場合には、最適化部２１はステップＳ２２以降の処理を再び実行する。一方、終了条件を満たす場合には、処理はステップＳ２６に進む。ステップＳ２６では、最適化部２１が、学習済みモデルによって最後に処理された入力ベクトル、すなわち、終了条件を満たす入力ベクトルを設計データとして出力する。設計データの出力方法は限定されない。例えば、最適化部２１は設計データを、メモリ、データベースなどの記憶装置に格納してもよいし、他のコンピュータシステムに送信してもよいし、モニタ上に表示してもよい。設計データは有体物の構造を設計するために用いられる。 If the termination condition is not satisfied, the optimization unit 21 executes the processing of step S22 and subsequent steps again. On the other hand, if the termination condition is satisfied, the process proceeds to step S26. In step S26, the optimization unit 21 outputs, as design data, an input vector that has been processed last by the learned model, that is, an input vector that satisfies the termination condition. The output method of the design data is not limited. For example, the optimization unit 21 may store the design data in a storage device such as a memory or a database, transmit the design data to another computer system, or display the design data on a monitor. The design data is used to design the structure of a tangible object.

［第２実施形態］
図３および図４を再び参照しながら、第１実施形態で説明した設計システム１の動作の適用例を第２実施形態として説明する。本実施形態では、有体物の構造の例として飛行体の翼の形状を示す。すなわち、本実施形態では、設計システム１は翼の形状を設計するための学習済みモデルを取得する。さらに、設計システム１は、その学習済みモデルを用いて、用意された入力データから翼の形状に関する設計データを出力する。 [Second embodiment]
An application example of the operation of the design system 1 described in the first embodiment will be described as a second embodiment with reference to FIGS. 3 and 4 again. In the present embodiment, the shape of a wing of a flying object is shown as an example of the structure of a tangible object. That is, in the present embodiment, the design system 1 acquires a learned model for designing the shape of the wing. Further, the design system 1 uses the learned model to output design data relating to the shape of the wing from the prepared input data.

本実施形態では、翼の形状を示す設計データは、ｐ（ｎ），ｑ（ｎ），ｒ（ｎ），ｓ（ｎ），ｔ（ｎ），ｕ（ｎ）という６個のパラメータで構成されるものとする。すなわち、ｘ（ｎ）＝（ｐ（ｎ），ｑ（ｎ），ｒ（ｎ），ｓ（ｎ），ｔ（ｎ），ｕ（ｎ））である。この６個のパラメータの初期値は、従来から高性能であると認められている７種類の翼Ｗ_ａ，Ｗ_ｂ，Ｗ_ｃ，Ｗ_ｄ，Ｗ_ｅ，Ｗ_ｆ，Ｗ_ｇの形状を用いて決めることができる。翼Ｗ_ａと翼Ｗ_ｂとの形状の差をｚ_ｂとすると、変数ｐを用いて新しい形状Ｗ_ｎｅｗを作成することができる。この新しい形状は、Ｗ_ｎｅｗ＝Ｗ_ａ＋ｐｚ_ｂで表される。この式で用いられるｐがパラメータｐ（ｎ）に対応する。同様に、翼Ｗ_ａと翼Ｗ_ｃとの形状の差をｚ_ｃとすると、変数ｑを用いて新しい形状Ｗ_ｎｅｗを作成することができる。この新しい形状は、Ｗ_ｎｅｗ＝Ｗ_ａ＋ｑｚ_ｃで表される。この式で用いられるｑがパラメータｑ（ｎ）に対応する。同様に、翼Ｗ_ａと他の４種類の翼Ｗ_ｄ，Ｗ_ｅ，Ｗ_ｆ，Ｗ_ｇとの差に基づいて、パラメータｒ，ｓ，ｔ，ｕが得られる。これら６個のパラメータは翼の長さおよび厚さを示す。 In the present embodiment, the design data indicating the shape of the wing is composed of six parameters p (n), q (n), r (n), s (n), t (n) and u (n). Shall be performed. That is, x (n) = (p (n), q (n), r (n), s (n), t (n), u (n)). The initial value of the six parameters, using seven types have been found to be high from a conventional wing _{_{_{_{W a, W b, W c}}}} , W d, W e, W f, the shape of _{W g} You can decide. When the difference in shape between the wings _{W a} and wings _{W b} and _{z b,} it is possible to create a new shape _{W new new} with variable p. This new shape is represented by W _new = W _a + pz _b . P used in this equation corresponds to the parameter p (n). Similarly, the difference between the shapes of the wings W _a and wings W _c When z _c, it is possible to create a new shape W _{new new} using variable q. This new shape is represented as W _new = W _a + qz _c . Q used in this equation corresponds to the parameter q (n). Similarly, the wing _{W a} and another four wings _{_{_{W d, W e, W f}}} , based on the difference between _{W g,} parameter r, s, t, u are obtained. These six parameters indicate the length and thickness of the wing.

図３を参照しながら学習部１１での処理を説明する。ステップＳ１１では、学習部１１はｉ番目の初期ベクトルｘ_ｉ（０）＝（ｐ_ｉ（０），ｑ_ｉ（０），ｒ_ｉ（０），ｓ_ｉ（０），ｔ_ｉ（０），ｕ_ｉ（０））を設定する。ステップＳ１２では、学習部１１は入力ベクトルｘ_ｉ（０）に基づいて性能値ｆ_ｉ（０）を求める。ただし、ステップＳ１２は省略されてもよい。 The processing in the learning unit 11 will be described with reference to FIG. In step S11, the learning unit 11 i-th initial vector _{_{_{x i (0) = (p}}} i (0), q i (0), r i (0), s i (0), t i (0), u _i (0)). In step S12, the learning unit 11 obtains a performance value f _i (0) based on the input vector x _i (0). However, step S12 may be omitted.

その後、学習部１１は入力データを少しずつ変えながら強化学習を繰り返す。ステップＳ１３では、学習部１１は入力ベクトルｘ_ｉ（ｎ）に対する強化学習を実行することで出力ベクトルを算出する。本実施形態では、計算モデルを構成するニューラルネットワークの出力層は、６個のパラメータに対応する６個のノードで構成される。それぞれのノードは、対応するパラメータを変更するか否かを示す値を出力する。ステップＳ１４では、学習部１１はその出力ベクトルに基づいて次の入力ベクトルｘ_ｉ（ｎ＋１）を設定する。具体的には、学習部１１は、６個のパラメータのうち、変更すると判定されたパラメータを、任意の方法で変更する。ステップＳ１５では、学習部１１は入力ベクトルｘ_ｉ（ｎ＋１）に基づいて性能値ｆ_ｉ（ｎ＋１）を求め、この性能値ｆ_ｉ（ｎ＋１）に基づいて、エージェントに付与する報酬を決定する。このステップＳ１５は省略可能である。ステップＳ１６で示すように、ステップＳ１３〜Ｓ１５の処理はＭ回繰り返して実行される。 Thereafter, the learning unit 11 repeats the reinforcement learning while changing the input data little by little. In step S13, the learning unit 11 calculates an output vector by executing reinforcement learning on the input vector x _i (n). In the present embodiment, the output layer of the neural network forming the calculation model is configured with six nodes corresponding to the six parameters. Each node outputs a value indicating whether to change the corresponding parameter. In step S14, the learning unit 11 sets the next input vector x _i (n + 1) based on the output vector. Specifically, the learning unit 11 changes a parameter determined to be changed among the six parameters by an arbitrary method. In step S15, the learning unit 11 obtains a performance value f _i (n + 1) based on the input vector x _i (n + 1), and determines a reward to be given to the agent based on the performance value f _i (n + 1). This step S15 can be omitted. As shown in step S16, the processing of steps S13 to S15 is repeatedly executed M times.

ステップＳ１７において、学習部１１は終了条件を満たすか否かを判定する。終了条件を満たさない場合には、処理はステップＳ１８を経てステップＳ１１に戻る。一方、終了条件を満たす場合には、学習部１１は学習済みモデルを取得する。終了条件が、１万個の初期ベクトルｘ_ｉ（０）を処理することである場合には、学習部１１は、初期ベクトルｘ_{１００００}（０）を用いて一連の強化学習を実行した後に学習済みモデルを取得する。 In step S17, the learning unit 11 determines whether the termination condition is satisfied. If the termination condition is not satisfied, the process returns to step S11 via step S18. On the other hand, if the end condition is satisfied, the learning unit 11 acquires a learned model. When the end condition is to process 10,000 initial vectors x _i (0), the learning unit 11 performs a series of reinforcement learning using the initial vector x ₁₀₀₀₀ (0), and then finishes learning. Get the model.

図５は第２実施形態における強化学習の一例を示すグラフである。グラフの横軸は空力性能を示し、縦軸は振動性能を示す。空力性能および振動性能は性能値の例である。グラフでは、空力性能は左に行くほど良好であり、振動性能は上に行くほど良好である。グラフ中の結果２０１〜２０５のそれぞれは、一つの初期データに基づいて強化学習をＭ回繰り返した際の処理結果を示す。 FIG. 5 is a graph showing an example of the reinforcement learning in the second embodiment. The horizontal axis of the graph indicates aerodynamic performance, and the vertical axis indicates vibration performance. Aerodynamic performance and vibration performance are examples of performance values. In the graph, the aerodynamic performance is better on the left and the vibration performance is better on the upper. Each of the results 201 to 205 in the graph shows the processing result when the reinforcement learning is repeated M times based on one initial data.

学習を開始して間もない時点では、結果２０１，２０２，２０３に示すように、振動性能および空力性能はＭ回の学習を実行してもほとんど変わらない。しかし、エージェントが報酬を得ながら学習を積み重ねていくと、結果２０４，２０５に示すように、エージェントは初期データから最適解を得ることが可能になる。 Immediately after the learning is started, as shown in the results 201, 202, and 203, the vibration performance and the aerodynamic performance hardly change even if the learning is performed M times. However, as the agent accumulates learning while obtaining a reward, as shown in results 204 and 205, the agent can obtain an optimal solution from the initial data.

エージェントが最適解に至る過程は、報酬を付与する方法によって変わり得る。例えば、結果２０４は、振動性能を下げることなく空力性能を上げることができた場合には、空力性能の上昇分の報酬を付与する、という制約の下で、強化学習が行われた場合の一例を示す。一方、結果２０５は、振動性能が閾値以上であり且つ空力性能を上げることができた場合には、空力性能の上昇分の報酬を付与する、という制約の下で、強化学習が行われた場合の一例を示す。図５は、結果２０５に対応する閾値を線２１０で示す。 The process by which an agent arrives at an optimal solution can vary depending on the method of rewarding. For example, the result 204 is an example of a case where reinforcement learning is performed under the constraint that if aerodynamic performance can be increased without lowering vibration performance, a reward for increasing the aerodynamic performance is given. Is shown. On the other hand, the result 205 indicates that the reinforcement learning is performed under the constraint that if the vibration performance is equal to or higher than the threshold value and the aerodynamic performance can be improved, a reward for increasing the aerodynamic performance is given. An example is shown below. FIG. 5 shows the threshold value corresponding to the result 205 by a line 210.

図４を参照しながら最適化部２１での処理を説明する。ステップＳ２１では、最適化部２１は初期ベクトルｘ（０）＝（ｐ（０），ｑ（０），ｒ（０），ｓ（０），ｔ（０），ｕ（０））を設定する。ステップＳ２２では、最適化部２１は入力ベクトルｘ（ｎ）を学習済みモデルに入力することで出力ベクトルを算出する。ステップＳ２１の直後は、最適化部２１は初期ベクトルｘ（０）を学習済みモデルに入力する。ステップＳ２３では、最適化部２１はその出力ベクトルに基づいて次の入力ベクトルｘ（ｎ＋１）を設定する。ステップＳ２４では、最適化部２１はその入力ベクトルｘ（ｎ＋１）に基づいて性能値ｆ（ｎ＋１）を求める。上述したように、ステップＳ２４は省略されてもよい。 The processing in the optimization unit 21 will be described with reference to FIG. In step S21, the optimization unit 21 sets the initial vector x (0) = (p (0), q (0), r (0), s (0), t (0), u (0)). . In step S22, the optimization unit 21 calculates an output vector by inputting the input vector x (n) to the learned model. Immediately after step S21, the optimization unit 21 inputs the initial vector x (0) to the learned model. In step S23, the optimizing unit 21 sets the next input vector x (n + 1) based on the output vector. In step S24, the optimization unit 21 calculates a performance value f (n + 1) based on the input vector x (n + 1). As described above, step S24 may be omitted.

ステップＳ２５では、最適化部２１は終了条件を満たすか否かを判定する。終了条件を満たさない場合には、最適化部２１はステップＳ２２以降の処理を繰り返す。終了条件を満たす場合には、処理はステップＳ２６に進み、最適化部２１は設計データを出力する。この結果、設計システム１のユーザは、翼の最適な形状を得ることができる。 In step S25, the optimization unit 21 determines whether or not the end condition is satisfied. If the termination condition is not satisfied, the optimization unit 21 repeats the processing from step S22. If the end condition is satisfied, the process proceeds to step S26, and the optimizing unit 21 outputs the design data. As a result, the user of the design system 1 can obtain the optimum shape of the wing.

［第３実施形態］
図３および図４を再び参照しながら、第１実施形態で説明した設計システム１の動作の別の適用例を第３実施形態として説明する。本実施形態で、有体物の構造の例として飛行体の翼の迎角を示す。すなわち、設計システム１は翼の迎角を設計するための学習済みモデルを取得する。さらに、設計システム１は、その学習済みモデルを用いて、用意された入力データから翼の迎角に関する設計データを出力する。 [Third embodiment]
Another application example of the operation of the design system 1 described in the first embodiment will be described as a third embodiment with reference to FIGS. 3 and 4 again. In the present embodiment, the angle of attack of the wing of the flying object is shown as an example of the structure of a tangible object. That is, the design system 1 acquires a learned model for designing the attack angle of the wing. Further, the design system 1 uses the learned model to output design data relating to the angle of attack of the wing from the prepared input data.

本実施形態での設計システム１の目的は、翼の形状ｙとレイノルズ数Ｒｅとが与えられた上で、最適な迎角αを求めることである。迎角αは｛０，１，２，…，３９｝の中から選ばれるものとする。最適な迎角αは、抵抗係数Ｃ_Ｄに対する揚力係数Ｃ_Ｌの比Ｃ_Ｌ／Ｃ_Ｄを最大にする値である。本実施形態ではその比Ｃ_Ｌ／Ｃ_Ｄを性能値ｆとして用いる。 The purpose of the design system 1 in the present embodiment is to determine the optimum angle of attack α after the wing shape y and the Reynolds number Re are given. The angle of attack α is selected from {0, 1, 2, ..., 39}. The optimal angle of attack α is a value that maximizes the ratio C _L / C _D of the lift coefficient C _L to the resistance coefficient C _D. In the present embodiment, the ratio C _L / C _D is used as the performance value f.

本実施形態では、計算モデルの入力データは、ｋ−ε乱流モデルを用いた２次元定常状態非圧縮性流体解析（ｔｗｏ−ｄｉｍｅｎｓｉｏｎａｌｓｔｅａｄｙ−ｓｔａｔｅｉｎｃｏｍｐｒｅｓｓｉｂｌｅｆｌｏｗａｎａｌｙｓｉｓ）の実行結果である、正規化されたカラー等高線画像（ｃｏｌｏｒｃｏｎｔｏｕｒｉｍａｇｅ）である。この画像は４００（ピクセル）×４００（ピクセル）であり、個々の画素値、すなわち個々のＲＧＢ値は気流の情報を含む。強化学習のために計算モデルに入力されるそれぞれのカラー等高線画像は、翼の形状ｙ、レイノルズ数Ｒｅ、および初期迎角のそれぞれをランダムに設定して上記の流体解析を実行することで得ることができる。 In the present embodiment, the input data of the calculation model is normalized, which is the execution result of a two-dimensional steady-state incompressible flow analysis using a k-ε turbulence model. 3 is a color contour image. This image is 400 (pixels) × 400 (pixels), and the individual pixel values, that is, the individual RGB values, include airflow information. Each color contour image input to the calculation model for reinforcement learning can be obtained by randomly setting the wing shape y, the Reynolds number Re, and the initial angle of attack and performing the fluid analysis described above. Can be.

図６は、本実施形態で用いられる計算モデルの一例を示す。本実施形態での計算モデルは、カラー等高線画像３１０を受け付ける深層ニューラルネットワーク３００を含む。この深層ニューラルネットワーク３００は、畳み込み層（ｃｏｎｖｏｌｕｔｉｏｎａｌｌａｙｅｒ）およびプーリング層（ｐｏｏｌｉｎｇｌａｙｅｒ）の二つのセットと、その二つのセットに続く三つの全結合層（ｆｕｌｌｙｃｏｎｎｅｃｔｅｄｌａｙｅｒ）とを含む。図６では、全結合層を「ＦＣ」で示す。「ＦＣ」の後に続く数字は一つの層を構成するノードの個数を示す。深層ニューラルネットワーク３００は、迎角を１°上げるかそれとも１°下げるかという結果を出力する。 FIG. 6 shows an example of a calculation model used in the present embodiment. The calculation model in the present embodiment includes a deep neural network 300 that receives a color contour image 310. The deep neural network 300 includes two sets of a convolutional layer and a pooling layer, and three fully connected layers following the two sets. In FIG. 6, all tie layers are denoted by “FC”. The number following “FC” indicates the number of nodes constituting one layer. The deep neural network 300 outputs a result indicating whether the elevation angle is increased by 1 ° or decreased by 1 °.

図３を参照しながら学習部１１での処理を説明する。ステップＳ１１では、学習部１１はｉ番目の初期ベクトルｘ_１（０）を設定する。この初期ベクトルｘ_１（０）はカラー等高線画像を構成する全画素のＲＧＢ値を示す。迎角αおよび比Ｃ_Ｌ／Ｃ_Ｄは計算モデルに与えられず、これは、学習部１１が画像のみを用いて学習済みモデルを取得できることを意味する。ステップＳ１２では、学習部１１は入力ベクトルｘ_ｉ（０）に基づいて性能値ｆ_ｉ（０）を求める。ただし、ステップＳ１２は省略されてもよい。 The processing in the learning unit 11 will be described with reference to FIG. In step S11, the learning unit 11 sets the i-th initial vector _x 1 a (0). This initial vector x ₁ (0) indicates the RGB values of all pixels constituting the color contour image. The angle of attack α and the ratio C _L / C _D are not given to the calculation model, which means that the learning unit 11 can acquire the learned model using only the image. In step S12, the learning unit 11 obtains a performance value f _i (0) based on the input vector x _i (0). However, step S12 may be omitted.

その後、学習部１１は入力データを少しずつ変えながら強化学習を繰り返す。ステップＳ１３では、学習部１１は入力ベクトルｘ_ｉ（ｎ）に対する強化学習を実行することで出力ベクトルを算出する。本実施形態では、計算モデルを構成するニューラルネットワークの出力層は、迎角を上げるか下げるかを示す２個のノードで構成される。ステップＳ１４では、学習部１１はその出力ベクトルに基づいて次の入力ベクトルｘ_ｉ（ｎ＋１）を設定する。具体的には、学習部１１はその出力ベクトルに基づく数値解析を実行することで新たなカラー等高線画像を生成し、該新たなカラー等高線画像を構成する全画素のＲＧＢ値を示す次の入力ベクトルｘ_ｉ（ｎ＋１）を設定する。ステップＳ１５では、学習部１１は入力ベクトルｘ_ｉ（ｎ＋１）に基づいて性能値ｆ_ｉ（ｎ＋１）を求め、この性能値ｆ_ｉ（ｎ＋１）に基づいて、エージェントに付与する報酬を決定する。例えば、学習部１１はｆ_ｉ（ｎ＋１）とｆ_ｉ（０）との比較に基づいて報酬を決定してもよい。このステップＳ１５は省略可能である。 Thereafter, the learning unit 11 repeats the reinforcement learning while changing the input data little by little. In step S13, the learning unit 11 calculates an output vector by executing reinforcement learning on the input vector x _i (n). In the present embodiment, the output layer of the neural network forming the calculation model is composed of two nodes indicating whether to increase or decrease the angle of attack. In step S14, the learning unit 11 sets the next input vector x _i (n + 1) based on the output vector. Specifically, the learning unit 11 generates a new color contour image by performing a numerical analysis based on the output vector, and generates the next input vector indicating the RGB values of all the pixels constituting the new color contour image. x _i (n + 1) is set. In step S15, the learning unit 11 obtains a performance value f _i (n + 1) based on the input vector x _i (n + 1), and determines a reward to be given to the agent based on the performance value f _i (n + 1). For example, the learning unit 11 may determine a reward based on a comparison between _f i (n + 1) and _f i (0). This step S15 can be omitted.

ステップＳ１７において、学習部１１は終了条件を満たすか否かを判定する。本実施形態での終了条件は、例えば、直近の二つの入力ベクトルが同じであること、すなわち、ｘ_ｉ（ｎ）＝ｘ_ｉ（ｎ−１）であってもよい。終了条件を満たさない場合には、処理はステップＳ１８を経てステップＳ１１に戻る。一方、終了条件を満たす場合には、学習部１１は学習済みモデルを取得する。 In step S17, the learning unit 11 determines whether the termination condition is satisfied. End condition in this embodiment, for example, that the last two input vectors are the same, i.e., may be _{x i (n) = x i} (n-1). If the termination condition is not satisfied, the process returns to step S11 via step S18. On the other hand, if the end condition is satisfied, the learning unit 11 acquires a learned model.

図４を参照しながら最適化部２１での処理を説明する。ステップＳ２１では、最適化部２１は、カラー等高線画像で表される初期ベクトルｘ（０）を設定する。ステップＳ２２では、最適化部２１は入力ベクトルｘ（ｎ）を学習済みモデルに入力することで出力ベクトルを算出する。ステップＳ２１の直後は、最適化部２１は初期ベクトルｘ（０）を学習済みモデルに入力する。ステップＳ２３では、最適化部２１はその出力ベクトルに基づいて次の入力ベクトルｘ（ｎ＋１）を設定する。ステップＳ２４では、最適化部２１はその入力ベクトルｘ（ｎ＋１）に基づいて性能値ｆ（ｎ＋１）を求める。上述したように、ステップＳ２４は省略されてもよい。 The processing in the optimization unit 21 will be described with reference to FIG. In step S21, the optimization unit 21 sets an initial vector x (0) represented by a color contour image. In step S22, the optimization unit 21 calculates an output vector by inputting the input vector x (n) to the learned model. Immediately after step S21, the optimization unit 21 inputs the initial vector x (0) to the learned model. In step S23, the optimizing unit 21 sets the next input vector x (n + 1) based on the output vector. In step S24, the optimization unit 21 calculates a performance value f (n + 1) based on the input vector x (n + 1). As described above, step S24 may be omitted.

ステップＳ２５では、最適化部２１は終了条件を満たすか否かを判定する。終了条件は、例えば、直近の二つの入力ベクトルが同じであること、すなわち、ｘ（ｎ）＝ｘ（ｎ−１）であってもよい。終了条件を満たさない場合には、最適化部２１はステップＳ２２以降の処理を繰り返す。終了条件を満たす場合には、処理はステップＳ２６に進み、最適化部２１は設計データを出力する。この結果、設計システム１のユーザは、翼の最適な迎角を得ることができる。 In step S25, the optimization unit 21 determines whether or not the end condition is satisfied. The termination condition may be, for example, that two latest input vectors are the same, that is, x (n) = x (n−1). If the termination condition is not satisfied, the optimization unit 21 repeats the processing from step S22. If the end condition is satisfied, the process proceeds to step S26, and the optimizing unit 21 outputs the design data. As a result, the user of the design system 1 can obtain the optimum angle of attack of the wing.

［効果］
以上説明したように、本発明の一側面に係る設計システムは、少なくとも一つのプロセッサを備え、少なくとも一つのプロセッサが、有体物の構造を示す複数の初期データのそれぞれを計算モデルに入力して強化学習を繰り返し実行し、予め定められた強化学習の終了条件が満たされたことに応答して、強化学習により得られた計算モデルを、有体物の構造を設計するための学習済みモデルとして取得する。 [effect]
As described above, a design system according to one aspect of the present invention includes at least one processor, and at least one processor inputs each of a plurality of initial data indicating a structure of a tangible object to a computation model and performs reinforcement learning. Is repeatedly executed, and in response to satisfying a predetermined termination condition of reinforcement learning, a calculation model obtained by reinforcement learning is acquired as a trained model for designing a structure of a tangible object.

他の側面に係る設計システムでは、少なくとも一つのプロセッサが、複数の初期データのそれぞれについて、計算モデルの入力データの少なくとも一部を変更しながら強化学習を繰り返してもよい。設計システムが一つの初期データに対して自動的に入力データを変更しながら強化学習を繰り返すので、予め用意する初期データの量を低減することができる。したがって、初期データを記憶するためのメモリの容量を節約することが可能である。 In a design system according to another aspect, at least one processor may repeat reinforcement learning while changing at least a part of input data of a computation model for each of a plurality of initial data. Since the design system repeats the reinforcement learning while automatically changing the input data for one initial data, the amount of the initial data prepared in advance can be reduced. Therefore, it is possible to save the capacity of the memory for storing the initial data.

［変形例］
以上、本発明をその実施形態および実施例に基づいて詳細に説明した。しかし、本発明は上記の実施形態および実施例に限定されるものではない。本発明は、その要旨を逸脱しない範囲で様々な変形が可能である。 [Modification]
The present invention has been described in detail based on the embodiments and examples. However, the present invention is not limited to the above embodiments and examples. The present invention can be variously modified without departing from the gist thereof.

第２および第３実施形態では有体物の例として飛行体の翼を示すが、上述したように、有体物は限定されない。例えば、設計システム１は、移動体のエンジンを設計するための学習済みモデルを生成してもよい。より具体的には、設計システム１は、車両の過給機を設計するための学習済みモデルを生成することが可能であり、例えば、ハウジングの寸法、羽根角の寸法などを設計するための学習済みモデルを生成できる。 In the second and third embodiments, a wing of a flying object is shown as an example of a tangible object, but the tangible object is not limited as described above. For example, the design system 1 may generate a learned model for designing a mobile engine. More specifically, the design system 1 is capable of generating a learned model for designing a supercharger of a vehicle, for example, learning for designing dimensions of a housing, dimensions of a blade angle, and the like. You can generate a completed model.

上記実施形態では設計システム１が学習部１１および最適化部２１を備えるが、学習部１１および最適化部２１のいずれか一方が省略されてもよい。 In the above embodiment, the design system 1 includes the learning unit 11 and the optimization unit 21. However, one of the learning unit 11 and the optimization unit 21 may be omitted.

少なくとも一つのプロセッサにより実行される学習済みモデル生成方法の処理手順は上記実施形態での例に限定されない。また、少なくとも一つのプロセッサにより実行される最適化方法の処理手順も上記実施形態での例に限定されない。例えば、上述したステップの一部が省略されてもよいし、別の順序で各ステップが実行されてもよい。また、上述したステップのうちの任意の２以上のステップが組み合わされてもよいし、ステップの一部が修正又は削除されてもよい。あるいは、上記の各ステップに加えて他のステップが実行されてもよい。 The processing procedure of the learned model generation method executed by at least one processor is not limited to the example in the above embodiment. Further, the processing procedure of the optimization method executed by at least one processor is not limited to the example in the above embodiment. For example, some of the steps described above may be omitted, or the steps may be executed in another order. Also, any two or more of the steps described above may be combined, or some of the steps may be modified or deleted. Alternatively, other steps may be executed in addition to the above steps.

設計システム内で二つの数値の大小関係を比較する際には、「以上」および「よりも大きい」という二つの基準のどちらを用いてもよく、「以下」および「未満」という二つの基準のうちのどちらを用いてもよい。このような基準の選択は、二つの数値の大小関係を比較する処理についての技術的意義を変更するものではない。 When comparing the magnitude relationship between two numerical values in the design system, either of the two criteria of “greater than” and “greater than” may be used, and the two criteria of “less than” and “less than” may be used. Either of them may be used. The selection of such a criterion does not change the technical significance of the processing for comparing the magnitude relation between two numerical values.

１設計システム
１００コンピュータ
１０１プロセッサ
１１０設計プログラム
１０第１システム
１１学習部
２０第２システム
２１最適化部
1 Design System 100 Computer 101 Processor 110 Design Program 10 First System 11 Learning Unit 20 Second System 21 Optimization Unit

Claims

With at least one processor,
The at least one processor comprises:
Each of a plurality of initial data indicating the structure of a tangible object is input to a calculation model, and reinforcement learning is repeatedly executed,
In response to the predetermined end condition of the reinforcement learning being satisfied, the calculation model obtained by the reinforcement learning is obtained as a trained model for designing the structure of the tangible object,
Design system.

The at least one processor, for each of the plurality of initial data, repeat the reinforcement learning while changing at least a part of the input data of the calculation model,
The design system according to claim 1.

At least a part of the change of the input data includes a process of randomly changing at least a part of the input data,
The design system according to claim 2.

At least a part of the change of the input data includes a process of changing at least a part of the input data based on output data of the previous reinforcement learning,
The design system according to claim 2.

The tangible object is selected from at least a part of a building, at least a part of a structure, and at least a part of a moving object,
The design system according to claim 1.

The at least one processor outputs design data of the tangible object by inputting input data indicating a structure of the tangible object to the learned model,
The design system according to claim 1.

Inputting each of a plurality of initial data indicating the structure of a tangible object to a calculation model and repeatedly executing reinforcement learning;
Acquiring the calculation model obtained by the reinforcement learning as a trained model for designing the structure of the tangible object in response to the predetermined end condition of the reinforcement learning being satisfied. Includes learned model generation methods.

Inputting each of a plurality of initial data indicating the structure of a tangible object to a calculation model and repeatedly executing reinforcement learning;
Acquiring the calculation model obtained by the reinforcement learning as a trained model for designing the structure of the tangible object in response to the predetermined end condition of the reinforcement learning being satisfied. A design program to be executed by a computer.