JP7005872B2

JP7005872B2 - Convergence calculation support device, convergence calculation support method, and convergence calculation support program

Info

Publication number: JP7005872B2
Application number: JP2018013478A
Authority: JP
Inventors: 琢也鈴木
Original assignee: Takenaka Corp
Current assignee: Takenaka Corp
Priority date: 2018-01-30
Filing date: 2018-01-30
Publication date: 2022-01-24
Anticipated expiration: 2038-01-30
Also published as: JP2019133301A

Description

本発明は、収斂計算支援装置、収斂計算支援方法、及び収斂計算支援プログラムに関する。 The present invention relates to a convergence calculation support device, a convergence calculation support method, and a convergence calculation support program.

従来、学習初期の段階におけるエージェントの学習速度を向上させる強化学習システムが提案されている（例えば、特許文献１）。 Conventionally, a reinforcement learning system for improving the learning speed of an agent in the initial stage of learning has been proposed (for example, Patent Document 1).

また、行動によって遷移する一連の状態と各状態での行動とを表す訓練行動列からなる集合、及び訓練行動列それぞれの適切さを表す訓練順序評価値の集合を記憶する技術が提案されている（例えば、特許文献２）。この技術では、報酬関数を用いて得られる訓練行動列が表す一連の状態の報酬値に対応する順序評価値の集合と、記憶された訓練順序評価値の集合との相違に基づいて報酬関数を更新する。 Further, a technique has been proposed in which a set consisting of a training action sequence representing a series of states transitioned by an action and an action in each state, and a set of training order evaluation values representing the appropriateness of each training action sequence are stored. (For example, Patent Document 2). In this technique, the reward function is based on the difference between the set of ordinal evaluation values corresponding to the set of reward values of a series of states represented by the training action sequence obtained by using the reward function and the set of stored training order evaluation values. Update.

特開２００６－３０９５１９号公報Japanese Unexamined Patent Publication No. 2006-309519 特許第５８１５４５８号公報Japanese Patent No. 5815458

近年、解析技術の発達により、節点数及び要素数等の大きなモデルが用いられるようになっているが、収斂計算が必要となる非線形解析を行う場合、解析に多大な時間が費やされる、という問題点がある。 In recent years, with the development of analysis technology, large models such as the number of nodes and the number of elements have come to be used. However, when performing nonlinear analysis that requires convergence calculation, a large amount of time is spent on the analysis. There is a point.

しかしながら、上記特許文献１及び特許文献２に記載の技術では、非線形解析における解析時間の短縮については考慮されていない。 However, in the techniques described in Patent Document 1 and Patent Document 2, reduction of analysis time in nonlinear analysis is not considered.

本発明は上記事実を考慮して、非線形解析に費やされる解析時間を短縮することを目的とする。 In view of the above facts, the present invention aims to reduce the analysis time spent on nonlinear analysis.

上記目的を達成するために、本発明の収斂計算支援装置は、非線形解析を行う際の収斂計算における複数の状態の各々と、選択対象とする複数の求解手法の各々の行動価値とが対応付けられた情報を用いて、現状態に応じた求解手法を選択する選択部と、選択された求解手法が収斂解を導出するまで求解手法の選択及び選択された求解手法による予測解の導出を繰り返した期間である収斂期間に応じた報酬値を用いて、選択された又は未選択の求解手法に対応する前記行動価値を更新する更新部と、を含む。 In order to achieve the above object, in the convergence calculation support device of the present invention, each of the plurality of states in the convergence calculation when performing the nonlinear analysis is associated with the action value of each of the plurality of solution methods to be selected. Using the obtained information, the selection unit that selects the solution method according to the current state, the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. It includes an update unit that updates the action value corresponding to the selected or unselected solution method by using the reward value according to the convergence period which is the period.

本発明の収斂計算支援装置によれば、行動価値に応じた求解手法を選択し、収斂期間に応じた報酬値を用いて、選択された又は未選択の求解手法に対応する行動価値を更新している。従って、更新された行動価値を用いて非線形解析を行うことによって、非線形解析に費やされる解析時間を短縮することができる。 According to the convergence calculation support device of the present invention, the action value corresponding to the selected or unselected solution method is updated by selecting the solution method according to the action value and using the reward value according to the convergence period. ing. Therefore, by performing the nonlinear analysis using the updated behavioral value, the analysis time spent on the nonlinear analysis can be shortened.

なお、本発明の収斂計算支援装置は、前記選択部が、前記複数の求解手法の各々の行動価値の重み付け確率に従って求解手法を選択してもよい。これにより、行動価値の高い求解手法がより選ばれ易くなる結果、非線形解析に費やされる解析時間をより短縮することができる。 In the convergence calculation support device of the present invention, the selection unit may select a solution method according to the weighting probability of each action value of the plurality of solution methods. As a result, it becomes easier to select a solution method having a high behavioral value, and as a result, the analysis time spent on the nonlinear analysis can be further shortened.

また、上記目的を達成するために、本発明の収斂計算支援方法は、非線形解析を行う際の収斂計算における複数の状態の各々と、選択対象とする複数の求解手法の各々の行動価値とが対応付けられた情報を用いて、現状態に応じた求解手法を選択し、選択した求解手法が収斂解を導出するまで求解手法の選択及び選択した求解手法による予測解の導出を繰り返した期間である収斂期間に応じた報酬値を用いて、選択した又は未選択の求解手法に対応する前記行動価値を更新する処理をコンピュータが実行するものである。従って、上記収斂計算支援装置と同様に、非線形解析に費やされる解析時間を短縮することができる。 Further, in order to achieve the above object, in the convergence calculation support method of the present invention, each of the plurality of states in the convergence calculation when performing the nonlinear analysis and the action value of each of the plurality of solution methods to be selected are set. Using the associated information, the solution method according to the current state is selected, and the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. The computer executes a process of updating the action value corresponding to the selected or unselected solution method by using the reward value corresponding to a certain convergence period. Therefore, as with the convergence calculation support device, the analysis time spent on the nonlinear analysis can be shortened.

また、上記目的を達成するために、本発明の収斂計算支援プログラムは、非線形解析を行う際の収斂計算における複数の状態の各々と、選択対象とする複数の求解手法の各々の行動価値とが対応付けられた情報を用いて、現状態に応じた求解手法を選択し、選択した求解手法が収斂解を導出するまで求解手法の選択及び選択した求解手法による予測解の導出を繰り返した期間である収斂期間に応じた報酬値を用いて、選択した又は未選択の求解手法に対応する前記行動価値を更新する処理をコンピュータに実行させるためのものである。従って、上記収斂計算支援装置と同様に、非線形解析に費やされる解析時間を短縮することができる。 Further, in order to achieve the above object, in the convergence calculation support program of the present invention, each of the plurality of states in the convergence calculation when performing the nonlinear analysis and the action value of each of the plurality of solution methods to be selected are set. Using the associated information, the solution method according to the current state is selected, and the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. The purpose is to cause a computer to perform a process of updating the action value corresponding to a selected or unselected solution method by using a reward value corresponding to a certain convergence period. Therefore, as with the convergence calculation support device, the analysis time spent on the nonlinear analysis can be shortened.

本発明によれば、非線形解析に費やされる解析時間を短縮することができる。 According to the present invention, the analysis time spent on the nonlinear analysis can be shortened.

実施形態に係る収斂計算支援装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware composition of the convergence calculation support apparatus which concerns on embodiment. 実施形態に係る行動価値テーブルの一例を示す図である。It is a figure which shows an example of the action value table which concerns on embodiment. 実施形態に係る報酬テーブルの一例を示す図である。It is a figure which shows an example of the reward table which concerns on embodiment. 実施形態に係る収斂計算支援装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the functional structure of the convergence calculation support apparatus which concerns on embodiment. 実施形態に係る状態遷移を説明するための図である。It is a figure for demonstrating the state transition which concerns on embodiment. 実施形態に係る行動価値の更新処理を説明するための図である。It is a figure for demonstrating the update process of the action value which concerns on embodiment. 実施形態に係る解析処理の一例を示すフローチャートである。It is a flowchart which shows an example of the analysis process which concerns on embodiment. 実施形態に係る解析対象のモデルの一例を示す斜視図である。It is a perspective view which shows an example of the model of the analysis target which concerns on embodiment. 変形例に係る行動価値テーブルの一例を示す図である。It is a figure which shows an example of the action value table which concerns on a modification.

以下、図面を参照して、本発明を実施するための形態例を詳細に説明する。なお、本実施形態では、一例として、地震等に起因して外力（荷重）が与えられた際の材料の変位を解析する非線形解析における収斂計算を、強化学習を用いて行う形態例を説明する。 Hereinafter, examples of embodiments for carrying out the present invention will be described in detail with reference to the drawings. In this embodiment, as an example, a configuration example in which convergence calculation in non-linear analysis for analyzing displacement of a material when an external force (load) is applied due to an earthquake or the like is performed using reinforcement learning will be described. ..

まず、図１を参照して、本実施形態に係る収斂計算支援装置１０のハードウェア構成を説明する。収斂計算支援装置１０は、図１に示すコンピュータで実現される。図１に示すように、収斂計算支援装置１０は、ＣＰＵ（Central Processing Unit）１１、一時記憶領域としてのメモリ１２、及び不揮発性の記憶部１３を含む。また、収斂計算支援装置１０は、液晶ディスプレイ等の表示装置１４、キーボードとマウス等の入力装置１５、及び外部装置との通信を行う際に用いられる通信Ｉ／Ｆ（InterFace）１６を含む。ＣＰＵ１１、メモリ１２、記憶部１３、表示装置１４、入力装置１５、及び通信Ｉ／Ｆ１６は、バス１７に接続される。なお、収斂計算支援装置１０の例としては、パーソナルコンピュータ、又はサーバコンピュータ等の情報処理装置が挙げられる。 First, with reference to FIG. 1, the hardware configuration of the convergence calculation support device 10 according to the present embodiment will be described. The convergence calculation support device 10 is realized by the computer shown in FIG. As shown in FIG. 1, the convergence calculation support device 10 includes a CPU (Central Processing Unit) 11, a memory 12 as a temporary storage area, and a non-volatile storage unit 13. Further, the convergence calculation support device 10 includes a display device 14 such as a liquid crystal display, an input device 15 such as a keyboard and a mouse, and a communication I / F (InterFace) 16 used when communicating with an external device. The CPU 11, the memory 12, the storage unit 13, the display device 14, the input device 15, and the communication I / F 16 are connected to the bus 17. Examples of the convergence calculation support device 10 include an information processing device such as a personal computer or a server computer.

記憶部１３には、行動価値テーブル２０、報酬テーブル２２、及び解析プログラム２４が記憶される。図２に、行動価値テーブル２０の一例を示す。図２に示すように、行動価値テーブル２０には、非線形解析を行う際の収斂計算における複数の状態と、選択対象とする複数の求解手法との組み合わせ毎に行動価値が記憶される。行動価値テーブル２０が、非線形解析を行う際の収斂計算における複数の状態の各々と、選択対象とする複数の求解手法の各々の行動価値とが対応付けられた情報の一例である。 The action value table 20, the reward table 22, and the analysis program 24 are stored in the storage unit 13. FIG. 2 shows an example of the action value table 20. As shown in FIG. 2, the action value table 20 stores the action value for each combination of a plurality of states in the convergence calculation when performing the nonlinear analysis and a plurality of solution methods to be selected. The action value table 20 is an example of information in which each of the plurality of states in the convergence calculation when performing the nonlinear analysis is associated with the action value of each of the plurality of solution methods to be selected.

本実施形態では、図２の例における手法Ａとして初期剛性法を適用し、手法Ｂとして接線剛性法を適用し、手法ＣとしてＢＦＧＳ（Broyden，Fletcher，Goldfarb，Shanno）法を適用した場合について説明する。なお、図２では、状態の数が３つの場合を例示しているが、状態の数は３つに限定されず、２つでもよいし、４つ以上でもよい。また、図２では、求解手法の数が３つの場合を例示しているが、求解手法の数は３つに限定されず、２つでもよいし、４つ以上でもよい。また、求解手法として、至近の接線剛性、すなわち、前回に更新したときの接線勾配を用いる手法を適用してもよい。 In this embodiment, a case where the initial rigidity method is applied as the method A in the example of FIG. 2, the tangential rigidity method is applied as the method B, and the BFGS (Broyden, Fletcher, Goldfarb, Shanno) method is applied as the method C will be described. do. Although FIG. 2 illustrates the case where the number of states is three, the number of states is not limited to three, and may be two or four or more. Further, although FIG. 2 illustrates a case where the number of solution methods is three, the number of solution methods is not limited to three, and may be two or four or more. Further, as the solution method, a method using the nearest tangential rigidity, that is, the tangential gradient at the time of the previous update may be applied.

図３に、報酬テーブル２２の一例を示す。図３に示すように、報酬テーブル２２には、収斂期間に応じた報酬値が記憶される。本実施形態に係る報酬テーブル２２の報酬値は、収斂期間が短いほど報酬が高くなる値とされる。なお、収斂期間の詳細については後述する。 FIG. 3 shows an example of the reward table 22. As shown in FIG. 3, the reward table 22 stores reward values according to the convergence period. The reward value of the reward table 22 according to the present embodiment is set to a value in which the shorter the convergence period, the higher the reward. The details of the convergence period will be described later.

次に、図４を参照して、本実施形態に係る収斂計算支援装置１０の機能的な構成を説明する。図４に示すように、収斂計算支援装置１０は、選択部３０、導出部３２、及び更新部３４を含む。収斂計算支援装置１０のＣＰＵ１１が記憶部１３に記憶された解析プログラム２４を実行することにより、図４に示す選択部３０、導出部３２、及び更新部３４として機能する。 Next, with reference to FIG. 4, a functional configuration of the convergence calculation support device 10 according to the present embodiment will be described. As shown in FIG. 4, the convergence calculation support device 10 includes a selection unit 30, a derivation unit 32, and an update unit 34. By executing the analysis program 24 stored in the storage unit 13, the CPU 11 of the convergence calculation support device 10 functions as the selection unit 30, the derivation unit 32, and the update unit 34 shown in FIG.

選択部３０は、行動価値テーブル２０を参照し、現状態に応じた求解手法を選択する。本実施形態では、選択部３０は、行動価値テーブル２０の行動価値の重み付け確率に従って、現状態に応じた求解手法を選択する。具体的には、例えば、現状態が「初回イテレーション」である場合、選択部３０は、５０％（＝４÷（２＋２＋４）×１００）の確率で「手法Ｃ」を選択する。また、この場合、選択部３０は、２５％（＝２÷（２＋２＋４）×１００）の確率で「手法Ａ」又は「手法Ｂ」を選択する。 The selection unit 30 refers to the action value table 20 and selects a solution method according to the current state. In the present embodiment, the selection unit 30 selects a solution method according to the current state according to the weighting probability of the action value of the action value table 20. Specifically, for example, when the current state is "first iteration", the selection unit 30 selects "method C" with a probability of 50% (= 4 ÷ (2 + 2 + 4) × 100). Further, in this case, the selection unit 30 selects "method A" or "method B" with a probability of 25% (= 2 ÷ (2 + 2 + 4) × 100).

導出部３２は、選択部３０により選択された求解手法を用いて、予測解を導出する。更新部３４は、導出部３２が収斂解を導出するまで選択部３０による求解手法の選択、及び選択された求解手法を用いた導出部３２による予測解の導出を繰り返した期間である収斂期間を計測する。なお、ここでいう収斂解とは、真の解との誤差が予め定められた範囲内である予測解を意味する。また、本実施形態では、現状態に応じた１回の選択部３０による求解手法の選択、及び選択された求解手法を用いた導出部３２による予測解の導出を「イテレーション」という。 The derivation unit 32 derives a predicted solution by using the solution method selected by the selection unit 30. The update unit 34 sets the convergence period, which is a period in which the selection unit 30 repeats the selection of the solution method by the selection unit 30 and the derivation unit 32 using the selected solution method repeats the derivation of the predicted solution until the derivation unit 32 derives the convergence solution. measure. The convergent solution here means a predicted solution in which the error from the true solution is within a predetermined range. Further, in the present embodiment, the selection of the solution method by the selection unit 30 once according to the current state and the derivation of the predicted solution by the derivation unit 32 using the selected solution method are referred to as “iteration”.

図５に、５回のイテレーションで導出部３２が収斂解を導出した場合の状態の遷移の一例を示す。なお、図５の「状態Ａ」は図２の「初回イテレーション」に対応し、図５の「状態Ｂ」は図２の「誤差が閾値未満」に対応し、図５の「状態Ｃ」は図２の「誤差が閾値以上」に対応する。また、図２の「誤差が閾値未満（以上）」とは、導出部３２により導出された予測解と真の解との差の絶対値が、予め定められた閾値ＴＨ未満（以上）であるということを表す。 FIG. 5 shows an example of the state transition when the derivation unit 32 derives the convergent solution by five iterations. The "state A" in FIG. 5 corresponds to the "first iteration" in FIG. 2, the "state B" in FIG. 5 corresponds to the "error less than the threshold" in FIG. 2, and the "state C" in FIG. 5 corresponds to. Corresponds to "the error is equal to or greater than the threshold value" in FIG. Further, “the error is less than (or more) than the threshold value” in FIG. 2 means that the absolute value of the difference between the predicted solution derived by the derivation unit 32 and the true solution is less than (or more than) the predetermined threshold value TH. It means that.

図５の例では、まず、初回のイテレーション（状態Ａ）で選択部３０により手法Ｃが選択され、導出部３２により手法Ｃを用いて予測解が導出される。そして、導出された予測解と真の解との差の絶対値が閾値ＴＨ以上であるため、状態は状態Ｃに遷移し、現状態が状態Ｃとなる。 In the example of FIG. 5, first, the method C is selected by the selection unit 30 in the first iteration (state A), and the prediction solution is derived by the derivation unit 32 using the method C. Then, since the absolute value of the difference between the derived predicted solution and the true solution is equal to or greater than the threshold value TH, the state transitions to the state C, and the current state becomes the state C.

次に、２回目のイテレーションで選択部３０により手法Ａが選択され、導出部３２により手法Ａを用いて予測解が導出される。そして、導出された予測解と真の解との差の絶対値が閾値ＴＨ以上であるため、状態は状態Ｃに遷移し、現状態が状態Ｃとなる。次に、３回目のイテレーションで選択部３０により手法Ｂが選択され、導出部３２により手法Ｂを用いて予測解が導出される。そして、導出された予測解と真の解との差の絶対値が閾値ＴＨ以上であるため、状態は状態Ｃに遷移し、現状態が状態Ｃとなる。 Next, in the second iteration, the method A is selected by the selection unit 30, and the prediction solution is derived by the derivation unit 32 using the method A. Then, since the absolute value of the difference between the derived predicted solution and the true solution is equal to or greater than the threshold value TH, the state transitions to the state C, and the current state becomes the state C. Next, in the third iteration, the method B is selected by the selection unit 30, and the prediction solution is derived by the derivation unit 32 using the method B. Then, since the absolute value of the difference between the derived predicted solution and the true solution is equal to or greater than the threshold value TH, the state transitions to the state C, and the current state becomes the state C.

次に、４回目のイテレーションで選択部３０により手法Ｂが選択され、導出部３２により手法Ｂを用いて予測解が導出される。そして、導出された予測解と真の解との差の絶対値が閾値ＴＨ未満であり、かつ予測解が収斂解ではないため、状態は状態Ｂに遷移し、現状態が状態Ｂとなる。次に、５回目のイテレーションで選択部３０により手法Ｂが選択され、導出部３２により手法Ｂを用いて予測解が導出される。そして、導出された予測解と真の解との差の絶対値が閾値ＴＨ未満であり、かつ予測解が収斂解であるため、処理が終了する。前述した収斂期間とは、図５に示す例では、この５回のイテレーションに要する時間に対応する。 Next, in the fourth iteration, the method B is selected by the selection unit 30, and the prediction solution is derived by the derivation unit 32 using the method B. Then, since the absolute value of the difference between the derived predicted solution and the true solution is less than the threshold value TH and the predicted solution is not a convergent solution, the state transitions to the state B and the current state becomes the state B. Next, in the fifth iteration, the selection unit 30 selects the method B, and the derivation unit 32 derives the predicted solution using the method B. Then, since the absolute value of the difference between the derived predicted solution and the true solution is less than the threshold value TH and the predicted solution is a convergent solution, the process ends. The above-mentioned convergence period corresponds to the time required for these five iterations in the example shown in FIG.

更新部３４は、収斂期間に応じた報酬値を用いて、選択された求解手法に対応する行動価値を更新する。本実施形態では、更新部３４は、まず、記憶部１３に記憶された行動価値テーブル２０をメモリ１２に複製（コピー）する。なお、更新部３４は、記憶部１３に記憶された行動価値テーブル２０を、記憶部１３の行動価値テーブル２０が記憶されている領域とは異なる領域に複製してもよい。また、以下では、この複製によってメモリ１２に記憶された行動価値テーブル２０を「複製行動価値テーブル」という。 The update unit 34 updates the action value corresponding to the selected solution method by using the reward value according to the convergence period. In the present embodiment, the update unit 34 first copies (copies) the action value table 20 stored in the storage unit 13 to the memory 12. The update unit 34 may duplicate the action value table 20 stored in the storage unit 13 in an area different from the area in which the action value table 20 of the storage unit 13 is stored. Further, in the following, the action value table 20 stored in the memory 12 by this duplication will be referred to as a “duplicate action value table”.

また、更新部３４は、報酬テーブル２２を参照し、計測された収斂期間に対応する報酬値を取得する。また、更新部３４は、以下に示す（１）式に従って、取得した報酬値を、選択された求解手法の選択回数に応じた割合で分配することによって、複製行動価値テーブルの行動価値を更新する。
行動価値に分配（加算）する値＝取得した報酬値×（選択回数÷全イテレーション回数） Further, the update unit 34 refers to the reward table 22 and acquires the reward value corresponding to the measured convergence period. Further, the update unit 34 updates the action value of the duplicate action value table by distributing the acquired reward value at a ratio according to the number of selections of the selected solution method according to the following equation (1). ..
Value to be distributed (added) to the action value = acquired reward value x (number of selections ÷ total number of iterations)

図５の例では、状態Ａのときに手法Ｃが１回選択され、状態Ｂのときに手法Ｂが１回選択され、状態Ｃのときに手法Ａが１回選択され、状態Ｃのときに手法Ｂが２回選択されている。また、図５の例における収斂期間は、５秒以上２０秒未満だったものとする。すなわち、図５の例では、一例として図６に示すように、状態Ａと手法Ｃとの組み合わせに対応する行動価値に「２」（＝１０×（１÷５））が加算される。同様に、図５の例では、状態Ｂと手法Ｂとの組み合わせに対応する行動価値に「２」が加算され、状態Ｃと手法Ａとの組み合わせに対応する行動価値に「２」が加算され、状態Ｃと手法Ｂとの組み合わせに対応する行動価値に「４」が加算される。 In the example of FIG. 5, method C is selected once in the state A, method B is selected once in the state B, method A is selected once in the state C, and method C is selected in the state C. Method B has been selected twice. Further, it is assumed that the convergence period in the example of FIG. 5 is 5 seconds or more and less than 20 seconds. That is, in the example of FIG. 5, as shown in FIG. 6 as an example, “2” (= 10 × (1/5)) is added to the action value corresponding to the combination of the state A and the method C. Similarly, in the example of FIG. 5, "2" is added to the action value corresponding to the combination of the state B and the method B, and "2" is added to the action value corresponding to the combination of the state C and the method A. , "4" is added to the action value corresponding to the combination of the state C and the method B.

換言すると、本実施形態では、収斂期間が短いときに選択された求解手法ほど、行動価値が高くなるように、行動価値が更新される。 In other words, in the present embodiment, the action value is updated so that the solution method selected when the convergence period is short has a higher action value.

なお、更新部３４は、選択された求解手法に対して報酬値を分配することに代えて、未選択の求解手法の行動価値を減らしてもよいし、選択された求解手法に対して報酬値を分配し、かつ未選択の求解手法の行動価値を減らしてもよい。 In addition, the update unit 34 may reduce the action value of the unselected solution method instead of distributing the reward value to the selected solution method, or the update unit 34 may reduce the action value to the selected solution method. And reduce the action value of unselected solution methods.

また、更新部３４は、以上の選択部３０、導出部３２、及び更新部３４による処理を、予め定められた回数の解析ステップだけ繰り返し、最終的に得られた複製行動価値テーブルを本番の解析に用いる行動価値テーブル２０として、記憶部１３に記憶する。なお、ここでいう解析ステップとは、例えば、所定期間（例えば、１／１００秒）毎に実行される処理単位を意味する。 Further, the update unit 34 repeats the above processing by the selection unit 30, the derivation unit 32, and the update unit 34 only for a predetermined number of analysis steps, and analyzes the finally obtained replication action value table in the actual production. It is stored in the storage unit 13 as the action value table 20 used for the above. The analysis step referred to here means, for example, a processing unit executed every predetermined period (for example, 1/100 second).

次に、図７を参照して、本実施形態に係る収斂計算支援装置１０の作用を説明する。収斂計算支援装置１０が解析プログラム２４を実行することによって、図７に示す解析処理を実行する。なお、図７に示す解析処理は、例えば、ユーザにより入力装置１５を介して解析プログラム２４の実行指示が入力された場合に実行される。 Next, with reference to FIG. 7, the operation of the convergence calculation support device 10 according to the present embodiment will be described. The convergence calculation support device 10 executes the analysis program 24 to execute the analysis process shown in FIG. 7. The analysis process shown in FIG. 7 is executed, for example, when an execution instruction of the analysis program 24 is input by the user via the input device 15.

図７のステップＳ１０で、更新部３４は、記憶部１３に記憶された行動価値テーブル２０をメモリ１２に複製する。ステップＳ１２で、選択部３０は、本番の解析モデルに類似するテスト用の解析モデルを選択する。なお、このテスト用の解析モデルは、例えば、記憶部１３に予め記憶されていてもよいし、ネットワークを介して外部装置から取得してもよい。また、本番の解析モデルに類似するテスト用の解析モデルが複数存在する場合は、選択部３０は、何れか１つのテスト用の解析モデルを選択する。この場合、２回目以降にステップＳ１２が実行される際には、選択部３０は、それまでに選択していないテスト用の解析モデルを選択する。以下のステップＳ１４からステップＳ３０までの処理は、ステップＳ１２の処理により選択された解析モデルに対して行われる。 In step S10 of FIG. 7, the update unit 34 duplicates the action value table 20 stored in the storage unit 13 in the memory 12. In step S12, the selection unit 30 selects a test analysis model similar to the production analysis model. The analysis model for this test may be stored in advance in the storage unit 13, or may be acquired from an external device via a network, for example. If there are a plurality of test analysis models similar to the actual analysis model, the selection unit 30 selects any one of the test analysis models. In this case, when step S12 is executed from the second time onward, the selection unit 30 selects an analysis model for testing that has not been selected so far. The following processes from step S14 to step S30 are performed on the analysis model selected by the process of step S12.

ステップＳ１４で、更新部３４は、収斂期間の計測を開始する。ステップＳ１６で、選択部３０は、前述したように、行動価値テーブル２０を参照し、現状態に応じた求解手法を選択する。ステップＳ１８で、導出部３２は、選択部３０により選択された求解手法を用いて、予測解を導出する。ステップＳ２０で、導出部３２は、ステップＳ１８の処理により導出された予測解に応じて、状態を次の状態に遷移させる。例えば、ステップＳ１８の処理により導出された予測解と真の解との差の絶対値が閾値ＴＨ未満の場合は、次の状態は「状態Ｂ」となる。本ステップＳ２０での遷移先の状態が、次にステップＳ１６が実行される際の現状態となる。 In step S14, the renewal unit 34 starts measuring the convergence period. In step S16, as described above, the selection unit 30 refers to the action value table 20 and selects a solution method according to the current state. In step S18, the derivation unit 32 derives a predicted solution using the solution method selected by the selection unit 30. In step S20, the derivation unit 32 shifts the state to the next state according to the predicted solution derived by the process of step S18. For example, when the absolute value of the difference between the predicted solution derived by the process of step S18 and the true solution is less than the threshold value TH, the next state is “state B”. The state of the transition destination in this step S20 becomes the current state when the next step S16 is executed.

ステップＳ２２で、導出部３２は、ステップＳ１８の処理により導出された予測解が収斂解であるか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ１６に戻り、肯定判定となった場合は、処理はステップＳ２４に移行する。ステップＳ２４で、更新部３４は、収斂期間の計測を終了する。また、更新部３４は、ステップＳ１４の処理により計測が開始されてから、本ステップＳ２４の処理により計測が終了するまでの期間を収斂期間として取得する。 In step S22, the derivation unit 32 determines whether or not the predicted solution derived by the process of step S18 is a convergent solution. If this determination is a negative determination, the process returns to step S16, and if the determination is affirmative, the process proceeds to step S24. In step S24, the renewal unit 34 ends the measurement of the convergence period. Further, the update unit 34 acquires the period from the start of the measurement by the process of step S14 to the end of the measurement by the process of this step S24 as the convergence period.

ステップＳ２６で、更新部３４は、報酬テーブル２２を参照し、計測された収斂期間に対応する報酬値を取得する。ステップＳ２８で、更新部３４は、前述したように、ステップＳ２６の処理により取得された報酬値を、選択された求解手法の選択回数に応じた割合で分配することによって、複製行動価値テーブルの行動価値を更新する。 In step S26, the renewal unit 34 refers to the reward table 22 and acquires the reward value corresponding to the measured convergence period. In step S28, as described above, the update unit 34 distributes the reward value acquired by the process of step S26 at a ratio according to the number of selections of the selected solution method, thereby performing the action of the duplicate action value table. Update the value.

ステップＳ３０で、更新部３４は、ステップＳ１４からステップＳ２８までの処理が、予め定められた解析ステップ数の全解析ステップについて完了したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ１４に戻り、肯定判定となった場合は、処理はステップＳ３２に移行する。 In step S30, the update unit 34 determines whether or not the processes from step S14 to step S28 have been completed for all the analysis steps in the predetermined number of analysis steps. If this determination is a negative determination, the process returns to step S14, and if the determination is affirmative, the process proceeds to step S32.

ステップＳ３２で、更新部３４は、ステップＳ１２からステップＳ３０までの処理が、本番の解析モデルに類似する全てのテスト用の解析モデルについて完了したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ１２に戻り、肯定判定となった場合は、処理はステップＳ３４に移行する。 In step S32, the update unit 34 determines whether or not the processing from step S12 to step S30 has been completed for all the analysis models for testing similar to the actual analysis model. If this determination is a negative determination, the process returns to step S12, and if the determination is affirmative, the process proceeds to step S34.

ステップＳ３４で、更新部３４は、メモリ１２に記憶された複製行動価値テーブルを本番の解析に用いる行動価値テーブル２０として、記憶部１３に出力（記憶）する。ステップＳ３４の処理が終了すると、解析処理が終了する。 In step S34, the update unit 34 outputs (stores) the duplicate action value table stored in the memory 12 to the storage unit 13 as the action value table 20 used for the actual analysis. When the process of step S34 is completed, the analysis process is completed.

本番の解析モデルに対しても、上記解析処理によって記憶部１３に記憶された本番用の行動価値テーブル２０を用いて、同様の処理が実行される。この場合、本番の解析モデルに対して、図７に示した解析処理のステップＳ１６、Ｓ１８、Ｓ２０、Ｓ２２、Ｓ３０が実行される。また、この場合、更に図７に示した解析処理のステップＳ１０、ステップＳ１４、Ｓ２４、Ｓ２６、Ｓ２８、Ｓ３４も実行することによって、行動価値を更新してもよい。また、この場合、複製行動価値テーブルを更新するのではなく、本番用の行動価値テーブル２０の行動価値を更新してもよい。この場合は、図７に示した解析処理のステップＳ１０及びステップＳ３４の実行は不要となり、ステップＳ２８で本番用の行動価値テーブル２０の行動価値を更新すればよい。 For the actual analysis model, the same process is executed using the actual action value table 20 stored in the storage unit 13 by the above analysis process. In this case, the analysis processing steps S16, S18, S20, S22, and S30 shown in FIG. 7 are executed for the actual analysis model. Further, in this case, the action value may be updated by further executing steps S10, S14, S24, S26, S28, and S34 of the analysis process shown in FIG. 7. Further, in this case, instead of updating the duplicate action value table, the action value of the action value table 20 for production may be updated. In this case, it is not necessary to execute steps S10 and S34 of the analysis process shown in FIG. 7, and the action value of the action value table 20 for production may be updated in step S28.

一例として図８に示す解析モデルを対象として、以下の（Ａ）～（Ｃ）の解析条件で解析した場合の解析時間について説明する。
（Ａ）材料条件は、以下に示す条件とする。
Ｇ＝１８０００［ｋＰａ］、ν=０．３、ρ＝１．８［ｔ／ｍ^３］、非線形ＶｏｎＭｉｓｅｓ σｙ＝５０［ｋＰａ］、Ｈａｒｄｅｎｉｎｇ＝２０００．０［ｋＰａ］
なお、Ｇはせん断剛性を示し、νはポアソン比を示し、ρは比重を示す。
（Ｂ）境界条件は、底面固定、及び側面繰り返し境界とする。
（Ｃ）荷重は、図８に示す上面中央の網掛け部分に、－２０００．０［ｋＮ］が載荷されるものとし、解析ステップ数は１０とする。 As an example, the analysis time when the analysis model shown in FIG. 8 is analyzed under the following analysis conditions (A) to (C) will be described.
(A) The material conditions are as shown below.
G = 18000 [kPa], ν = 0.3, ρ = 1.8 [t / m ³ ], nonlinear Von Mises σy = 50 [kPa], Hardening = 2000.0 [kPa]
In addition, G indicates shear rigidity, ν indicates Poisson's ratio, and ρ indicates specific gravity.
(B) The boundary conditions are fixed on the bottom surface and repeated boundary on the side surface.
(C) As for the load, −2000.0 [kN] is assumed to be loaded on the shaded portion in the center of the upper surface shown in FIG. 8, and the number of analysis steps is 10.

以上の解析条件で、求解手法として初期剛性法のみを用いた場合の解析時間は約３５秒であり、求解手法として接線剛性法のみを用いた場合の解析時間は約２１０秒であった。 Under the above analysis conditions, the analysis time when only the initial rigidity method was used as the solution method was about 35 seconds, and the analysis time when only the tangential rigidity method was used as the solution method was about 210 seconds.

これに対し、同じ解析条件で、本実施形態のように、求解手法として初期剛性法及び接線剛性法の何れかを選択的に用いた場合において、各解析ステップの１３回目のイテレーションのみで接線剛性法を用い、それ以外のイテレーションで初期剛性法を用いた場合の解析時間は約２７秒であった。すなわち、複数の求解手法を選択的に用いることによって、解析時間を短縮することができた。 On the other hand, under the same analysis conditions, when either the initial rigidity method or the tangential rigidity method is selectively used as the solution method as in the present embodiment, the tangential rigidity is obtained only by the 13th iteration of each analysis step. When the method was used and the initial stiffness method was used for other iterations, the analysis time was about 27 seconds. That is, the analysis time could be shortened by selectively using a plurality of solution methods.

以上説明したように、本実施形態によれば、非線形解析において、収斂期間に応じた報酬値を用いて、選択された求解手法に対応する行動価値を更新している。従って、更新された行動価値を用いて非線形解析を行うことによって、非線形解析に費やされる解析時間を短縮することができる。 As described above, according to the present embodiment, in the nonlinear analysis, the action value corresponding to the selected solution method is updated by using the reward value according to the convergence period. Therefore, by performing the nonlinear analysis using the updated behavioral value, the analysis time spent on the nonlinear analysis can be shortened.

また、本実施形態によれば、行動価値の重み付け確率に従って求解手法を選択している。従って、行動価値の高い求解手法がより選ばれ易くなる結果、非線形解析に費やされる解析時間をより短縮することができる。 Further, according to the present embodiment, the solution method is selected according to the weighting probability of the action value. Therefore, as a result of making it easier to select a solution method having a high behavioral value, it is possible to further shorten the analysis time spent on the nonlinear analysis.

なお、上記実施形態では、複製行動価値テーブルの行動価値を更新する場合について説明したが、これに限定されない。行動価値テーブル２０の行動価値を更新する形態としてもよい。この場合、図７に示した解析処理のステップＳ１０及びステップＳ３４の実行は不要となり、ステップＳ２８で行動価値テーブル２０の行動価値を更新すればよい。この形態例では、次回のステップＳ１６では行動価値が更新された行動価値テーブル２０を参照して求解手法が選択される。 In the above embodiment, the case of updating the action value of the duplicate action value table has been described, but the present invention is not limited to this. It may be a form of updating the action value of the action value table 20. In this case, it is not necessary to execute steps S10 and S34 of the analysis process shown in FIG. 7, and the action value of the action value table 20 may be updated in step S28. In this embodiment, in the next step S16, the solution method is selected with reference to the action value table 20 in which the action value is updated.

また、上記実施形態における行動価値テーブル２０の状態を異なる定義としてもよい。例えば、行動価値テーブル２０の状態を何回目のイテレーションであるかを表す情報で定義する形態としてもよい。この場合の行動価値テーブル２０の一例を図９に示す。 Further, the state of the action value table 20 in the above embodiment may be defined differently. For example, the state of the action value table 20 may be defined by information indicating the number of iterations. FIG. 9 shows an example of the action value table 20 in this case.

また、上記実施形態において、収斂期間に加えてディスクＩ／Ｏ量も解析の評価条件としもよい。この場合、例えば、収斂期間が短く、かつディスクＩ／Ｏ量が少ないほど、報酬を高くする形態が例示される。 Further, in the above embodiment, the disc I / O amount may be used as an evaluation condition for analysis in addition to the convergence period. In this case, for example, the shorter the convergence period and the smaller the amount of disk I / O, the higher the reward.

なお、上記実施形態では、行動価値の重み付け確率に従って求解手法を選択する場合について説明したが、これに限定されない。例えば、行動価値が所定値以上の求解手法の何れかをランダムに選択する形態としてもよい。 In the above embodiment, the case where the solution method is selected according to the weighting probability of the action value has been described, but the present invention is not limited to this. For example, one of the solution methods having an action value of a predetermined value or more may be randomly selected.

また、上記実施形態におけるＣＰＵ１１により行われる処理は、プログラムを実行することにより行われるソフトウェア処理として説明したが、ハードウェアで行われる処理としてもよい。また、ＣＰＵ１１により行われる処理は、ソフトウェア及びハードウェアの双方を組み合わせて行われる処理としてもよい。 Further, although the processing performed by the CPU 11 in the above embodiment has been described as software processing performed by executing a program, it may be processing performed by hardware. Further, the process performed by the CPU 11 may be a process performed by combining both software and hardware.

また、上記各実施形態では、解析プログラム２４が記憶部１３に予め記憶（インストール）されている態様を説明したが、これに限定されない。解析プログラム２４は、ＣＤ－ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ－ＲＯＭ（Digital Versatile Disk Read Only Memory）、及びＵＳＢ（Universal Serial Bus）メモリ等の記録媒体に記録された形態で提供されてもよい。また、解析プログラム２４は、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 Further, in each of the above embodiments, the embodiment in which the analysis program 24 is stored (installed) in the storage unit 13 in advance has been described, but the present invention is not limited to this. Even if the analysis program 24 is provided in a form recorded on a recording medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), and a USB (Universal Serial Bus) memory. good. Further, the analysis program 24 may be downloaded from an external device via a network.

１０収斂計算支援装置
１１ＣＰＵ
１２メモリ
１３記憶部
２０行動価値テーブル
２２報酬テーブル
２４解析プログラム
３０選択部
３２導出部
３４更新部 10 Convergence calculation support device 11 CPU
12 Memory 13 Storage unit 20 Action value table 22 Reward table 24 Analysis program 30 Selection unit 32 Derivation unit 34 Update unit

Claims

Select the solution method according to the current state by using the information associated with each of the multiple states in the convergence calculation when performing the nonlinear analysis and the action value of each of the multiple solution methods to be selected. Selection part and
Selected or unselected using the reward value according to the convergence period, which is the period during which the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. The update unit that updates the action value corresponding to the solution method of
Convergence calculation support device including.

The convergence calculation support device according to claim 1, wherein the selection unit selects a solution method according to a weighted probability of the action value of each of the plurality of solution methods.

Select the solution method according to the current state by using the information associated with each of the multiple states in the convergence calculation when performing the nonlinear analysis and the action value of each of the multiple solution methods to be selected. ,
Selected or unselected solution method using the reward value according to the convergence period, which is the period in which the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. Convergence calculation support method in which a computer executes a process of updating the action value corresponding to the above.

Select the solution method according to the current state by using the information associated with each of the multiple states in the convergence calculation when performing the nonlinear analysis and the action value of each of the multiple solution methods to be selected. ,
Selected or unselected solution method using the reward value according to the convergence period, which is the period in which the selection of the solution method and the derivation of the predicted solution by the selected solution method are repeated until the selected solution method derives the convergent solution. A convergence calculation support program that causes a computer to execute a process that updates the action value corresponding to the above.